Main Content

Analyze Data

GDC Data Dictionary 3.1 Expands Diagnosis Information, TCGA Properties, and Bioinformatics Workflows

GDC Data Dictionary 3.1 expands diagnosis information, TCGA properties, and bioinformatics workflows:

Read more about GDC Data Dictionary 3.1 Expands Diagnosis Information, TCGA Properties, and Bioinformatics Workflows

New Cohort Level MAF Tool and Enhancements in GDC 2.0

Introducing the latest advancements in GDC 2.0 with Release 2.1! Here's what's new:

Read more about New Cohort Level MAF Tool and Enhancements in GDC 2.0

Introducing GDC 2.0: A New Cohort-Centric Design

The GDC has released GDC 2.0 which expands on the GDC Data Portal initially launched in 2016 by providing a cohort-centric design and new analysis tools. GDC 2.0 features include:

Read more about Introducing GDC 2.0: A New Cohort-Centric Design

Why do some projects with WGS structural variant data have BEDPE files and some projects do not?

Submitted by Anonymous on Wed, 11/09/2022 - 13:35

Generally any WGS data should have associated structural variant files (BEDPE) except in the cases in which either there are no tumor/normal matches or when variant calling has not been implemented yet.

Read more about Why do some projects with WGS structural variant data have BEDPE files and some projects do not?

Why did the GDC remove SomaticSniper?

Submitted by Anonymous on Thu, 09/22/2022 - 09:29

The SomaticSniper whole exome variant caller was one of the first generation somatic mutation callers developed by the scientific community. It works the best with blood cancer that has high level of tumor-in-normal contaminations, but is often overly permissive for solid tumors. Since our first data release in 2016, the GDC has gradually adopted newer tools or new tool versions, and has transited the focus of somatic variant calling from any single caller to multi-caller ensemble.

Read more about Why did the GDC remove SomaticSniper?

Does the GDC provide access to germline variants?

Submitted by Anonymous on Thu, 07/14/2022 - 11:08

Any germline SNP calls are not available for exploration in the GDC Data Portal. Instead, alignments for germline data are available under controlled access. Users with appropriate access may use the alignments to generate germline variants.

Some somatic variants callers, such as MuTect2, also output somatic calls with some level of germline possibilities, such as those labelled as "germline_risk". Please note that these calls are, by no means, germline variants. They are somatic calls with boundary probability of germline risks.

Read more about Does the GDC provide access to germline variants?

Why did the GDC remove HTSeq for gene expression quantification?

Submitted by Anonymous on Thu, 07/14/2022 - 11:03

HTSeq had been the default RNA-Seq expression quantification tool since the first GDC data release. The GDC later updated the RNA-Seq alignment and quantification workflow to include STAR Count, which generates stranded counts by default in addition to the existing unstranded counts. During Data Release 32 for gene model updates, the GDC had 1) augmented the existing STAR Count output to include FPKM and FPKM-UQ normalizations; 2) reprocessed all the TCGA data using the latest RNA-Seq workflow with STAR Count.

Read more about Why did the GDC remove HTSeq for gene expression quantification?

Subscribe to Analyze Data