New TCGA and TARGET Properties and Sample Type Restructuring
In GDC Data Dictionary Release 2.6.0, the GDC added support for:
In GDC Data Dictionary Release 2.6.0, the GDC added support for:
GDC Data Release 36 includes WXS and RNA-Seq data for cases from NCI’s MATCH precision medicine clinical trial (MATCH-Z1D; phs001859) and WGS, WXS, and RNA-Seq data for lung adenocarcinoma cases fr
Generally any WGS data should have associated structural variant files (BEDPE) except in the cases in which either there are no tumor/normal matches or when variant calling has not been implemented yet.
The SomaticSniper whole exome variant caller was one of the first generation somatic mutation callers developed by the scientific community. It works the best with blood cancer that has high level of tumor-in-normal contaminations, but is often overly permissive for solid tumors. Since our first data release in 2016, the GDC has gradually adopted newer tools or new tool versions, and has transited the focus of somatic variant calling from any single caller to multi-caller ensemble.
New data sets are now available including RNA-Seq data from the TARGET acute myeloid leukemia project and single nuclei RNA-Seq data from the CPTAC program. These changes are summarized below:
In Data Release 34, the GDC released new data from the Beat AML and Clinical Proteomic Tumor Analysis Consortium (CPTAC) programs:
Any germline SNP calls are not available for exploration in the GDC Data Portal. Instead, alignments for germline data are available under controlled access. Users with appropriate access may use the alignments to generate germline variants.
Some somatic variants callers, such as MuTect2, also output somatic calls with some level of germline possibilities, such as those labelled as "germline_risk". Please note that these calls are, by no means, germline variants. They are somatic calls with boundary probability of germline risks.
For the reference genome, the GDC has been using an augmented version of GRCh38.p2 (with additional decoy sequences and virus sequences) since inception. The GDC does not use alternative contigs, and only derives high-level data from the major chromosomes, so the same reference genome is used for both gene model GENCODE v22 (from Data Release 1 to 31) and GENCODE v36 (from Data Release 32). As future versions of the reference genome are released, e.g., GRCh39, the GDC will evaluate the benefits of updating data to utilize the new version.
In GDC Data Dictionary Release 2.5.0, the GDC added support for:
In Data Release 33, the GDC released data from two new projects at the GDC: