Main Content

Can GDC data be used in commercial and academic tools?

Submitted by gaheens on

GDC data available as open-access data can be used with proper accreditation. Use of controlled-access data requires dbGaP access and users must abide by the data use agreement (DUA) associated with the study. See GDC Data Access and Sharing Policies or contact the NCI Office of Data Sharing directly for further clarification: NCIOfficeofDataSharing@mail.nih.gov.

Why does TCGABiolinks no longer work when retrieving diagnosis?

Submitted by gaheens on

TCGA clinical data was expanded in GDC Data Releases 42 and 43. TCGA clinical data used to have one diagnosis per case. With the clinical data expansion, it is possible that a TCGA case has multiple diagnoses. This could be due to pre-enrollment diagnoses or other reasons. To properly query for the diagnosis information associated with the molecular data, the primary disease flag should be set to true (i.e., diagnosis_is_primary_disease = true).

Why do CNVs of different genes in the GDC Data Portal differ from other genomic portals?

Submitted by gaheens on

This discrepancy is due to differences in the data processing pipelines used by the GDC and other genomic portals. At the GDC, gene-level CNVs are derived from a mix of standardized pipelines. For TCGA projects, the CNV values are prioritized in the following order: SNP6 ABSOLUTE (LiftOver) > SNP6 ASCAT3 > WGS AscatNGS > SNP6 ASCAT2. All of these workflows produce absolute integer copy number values. 

Subscribe to