Access Data
New TCGA and TARGET Properties and Sample Type Restructuring
In GDC Data Dictionary Release 2.6.0, the GDC added support for:
New Data from NCI’s MATCH and EAGLE Studies Available
GDC Data Release 36 includes WXS and RNA-Seq data for cases from NCI’s MATCH precision medicine clinical trial (MATCH-Z1D; phs001859) and WGS, WXS, and RNA-Seq data for lung adenocarcinoma cases fr
Why do some projects with WGS structural variant data have BEDPE files and some projects do not?
Generally any WGS data should have associated structural variant files (BEDPE) except in the cases in which either there are no tumor/normal matches or when variant calling has not been implemented yet.
Why did the GDC remove SomaticSniper?
The SomaticSniper whole exome variant caller was one of the first generation somatic mutation callers developed by the scientific community. It works the best with blood cancer that has high level of tumor-in-normal contaminations, but is often overly permissive for solid tumors. Since our first data release in 2016, the GDC has gradually adopted newer tools or new tool versions, and has transited the focus of somatic variant calling from any single caller to multi-caller ensemble.
New TARGET AML RNA-Seq and CPTAC Single Nuclei RNA-Seq Data, Update to Variant Caller Pipeline
New data sets are now available including RNA-Seq data from the TARGET acute myeloid leukemia project and single nuclei RNA-Seq data from the CPTAC program. These changes are summarized below:
New Release of Beat AML Mutations, CPTAC Cases, and Fusion Data
In Data Release 34, the GDC released new data from the Beat AML and Clinical Proteomic Tumor Analysis Consortium (CPTAC) programs:
Does the GDC provide access to germline variants?
Any germline SNP calls are not available for exploration in the GDC Data Portal. Instead, alignments for germline data are available under controlled access. Users with appropriate access may use the alignments to generate germline variants.
Some somatic variants callers, such as MuTect2, also output somatic calls with some level of germline possibilities, such as those labelled as "germline_risk". Please note that these calls are, by no means, germline variants. They are somatic calls with boundary probability of germline risks.
How often does the GDC update the workflow/reference genome? If the GDC updates the workflow/reference genome, does the GDC re-process all data sets?
For the reference genome, the GDC has been using an augmented version of GRCh38.p2 (with additional decoy sequences and virus sequences) since inception. The GDC does not use alternative contigs, and only derives high-level data from the major chromosomes, so the same reference genome is used for both gene model GENCODE v22 (from Data Release 1 to 31) and GENCODE v36 (from Data Release 32). As future versions of the reference genome are released, e.g., GRCh39, the GDC will evaluate the benefits of updating data to utilize the new version.
GDC Now Supports m6A MeRIP-Seq Data and the UICC Tumor Staging System
In GDC Data Dictionary Release 2.5.0, the GDC added support for: