GDC Data Dictionary 3.1 Expands Diagnosis Information, TCGA Properties, and Bioinformatics Workflows
GDC Data Dictionary 3.1 expands diagnosis information, TCGA properties, and bioinformatics workflows:
GDC Data Dictionary 3.1 expands diagnosis information, TCGA properties, and bioinformatics workflows:
Introducing the latest advancements in GDC 2.0 with Release 2.1! Here's what's new:
The GDC has released GDC 2.0 which expands on the GDC Data Portal initially launched in 2016 by providing a cohort-centric design and new analysis tools. GDC 2.0 features include:
Generally any WGS data should have associated structural variant files (BEDPE) except in the cases in which either there are no tumor/normal matches or when variant calling has not been implemented yet.
The SomaticSniper whole exome variant caller was one of the first generation somatic mutation callers developed by the scientific community. It works the best with blood cancer that has high level of tumor-in-normal contaminations, but is often overly permissive for solid tumors. Since our first data release in 2016, the GDC has gradually adopted newer tools or new tool versions, and has transited the focus of somatic variant calling from any single caller to multi-caller ensemble.
Any germline SNP calls are not available for exploration in the GDC Data Portal. Instead, alignments for germline data are available under controlled access. Users with appropriate access may use the alignments to generate germline variants.
Some somatic variants callers, such as MuTect2, also output somatic calls with some level of germline possibilities, such as those labelled as "germline_risk". Please note that these calls are, by no means, germline variants. They are somatic calls with boundary probability of germline risks.
HTSeq had been the default RNA-Seq expression quantification tool since the first GDC data release. The GDC later updated the RNA-Seq alignment and quantification workflow to include STAR Count, which generates stranded counts by default in addition to the existing unstranded counts. During Data Release 32 for gene model updates, the GDC had 1) augmented the existing STAR Count output to include FPKM and FPKM-UQ normalizations; 2) reprocessed all the TCGA data using the latest RNA-Seq workflow with STAR Count.