Data for New Projects Now Available in GDC Data Release 44
The GDC released Data Release 44 with data from new projects, new cases from existing projects, and more. Key highlights from this release include:
Why does TCGABiolinks no longer work when retrieving diagnosis?
TCGA clinical data was expanded in GDC Data Releases 42 and 43. TCGA clinical data used to have one diagnosis per case. With the clinical data expansion, it is possible that a TCGA case has multiple diagnoses. This could be due to pre-enrollment diagnoses or other reasons. To properly query for the diagnosis information associated with the molecular data, the primary disease flag should be set to true (i.e., diagnosis_is_primary_disease = true).
Why do CNVs of different genes in the GDC Data Portal differ from other genomic portals?
This discrepancy is due to differences in the data processing pipelines used by the GDC and other genomic portals. At the GDC, gene-level CNVs are derived from a mix of standardized pipelines. For TCGA projects, the CNV values are prioritized in the following order: SNP6 ABSOLUTE (LiftOver) > SNP6 ASCAT3 > WGS AscatNGS > SNP6 ASCAT2. All of these workflows produce absolute integer copy number values.
GDC Dictionary Updates in the Turing Release
The GDC Turing Data Dictionary Release 3.4 includes the following features: