Exceptional Responders and MP2PRT: New NCI Projects at the GDC!
In Data Release 33, the GDC released data from two new projects at the GDC:
In Data Release 33, the GDC released data from two new projects at the GDC:
Although GENCODE v22 data cannot be browsed in the GDC Data Portal, it can still be downloaded using the GDC Data Transfer Tool or API. You will need to either have a previous manifest or use known UUIDs to download v22 files.
The primary reasons for the fewer open-access mutations are from two strategies that improve quality: 1) TCGA is now using a 2-caller ensemble, instead of a single caller; 2) Removal of variants outside of the target capture region, instead of a combined “target capture + GAF exonic region”. Additionally, TCGA was the original project in which GDC open-access variants were produced and used variant rescue steps that only applied to TCGA. To keep the TCGA variant-calling pipeline consistent across projects, GDC is no longer rescuing MC3 and TCGA validation variants.
The gene model used as a reference across GDC has been updated from GENCODE 22 to GENCODE 36. GENCODE gene sets are continuously updated to improve the coverage and accuracy.
Highlights from the GDC's recent Data Release 30 include data from major studies exploring the effects of Chernobyl radiation and methylation arrays processed via a new pipeline at the GDC.
A recent data release offers new ways to access data from the Count Me In project.
The GDC supports the submission of clinical and biospecimen supplements. Supplemental files can be downloaded from the GDC by searching for the Data Type "Clinical Supplement" or "Biospecimen Supplement" from the facet search in the GDC Data Portal Repository. For TCGA data, the supplement data is provided in XML documents and tab delimited files (biotabs). These files, in varying degrees, provide information on marker status (e.g.
TCGA “collection” represents the collection of the sample for TCGA, whereas “procurement” represents the removal of tissue from the patient.
The GDC harmonizes data across projects. This includes aligning the genomic data to a common reference genome (HG38) and generating higher level data using GDC bioinformatics pipelines. Other repositories may process the data differently.