About the Data

Cancer is fundamentally a disease of the genome, caused by changes in the DNA, RNA, and proteins of a cell that push cell growth into overdrive. Identifying the genomic alterations that arise in cancer can help researchers decode how cancer develops and improve upon the diagnosis and treatment of cancers based on their distinct molecular abnormalities.

Data made available through the GDC is for research purposes only. The GDC provides researchers with access to standardized clinical, proteomic, epigenomics, and genomic data from cancer studies to enable exploratory analysis that cannot be considered definitive for outcomes.

The GDC assists researchers in exploratory analysis by identifying changes in cancer cells that may play an important role in cancer development. Through the GDC knowledge base, researchers can leverage data maintained in the GDC to assist in identifying both high- and low-frequency cancer drivers such as:

Mutations - The GDC provides access to DNA sequence data and generates associated Variant Calling Format (VCF) and Mutation Annotation Format (MAF) files that identify somatic mutations such as point mutations, missense mutations, nonsense mutations, and insertions and deletions (indels) of nucleotides in the DNA.
Copy Number Variants - The GDC provides access to Copy Number Variation (CNV) data to identify amplified and attenuated gene expression due to chromosomal duplications, loss, insertions and deletions.
Expression Quantification - The GDC provides access to mRNA and miRNA sequence data and quantifies gene and miRNA expression using standardized software pipelines; expression values are provided in simple tab-separated value format.
Post-transcriptional Modifications - The GDC provides access to mRNA sequence data to assist in identifying post-transcriptional splice modifications that are manifested as splice junction and isoform variants.
Structural Variants - The GDC provides access to Structural Variant data to identify genomic rearrangement events, such as fusions, duplications, truncations, large deletions, and others.
DNA Methylation - The GDC provides access to DNA CpG Methylation data to identify epigenomic modifications on the DNA.
Protein Expression - The GDC provides access to Protein Expression data to identify changes in protein expression and/or post-translational modifications.

Data and metadata is submitted to the GDC in standard data types and file formats through the GDC Data Submission Pipeline. Molecular data stored in the GDC are harmonized against a common reference genome.

See our Data Model

Want to know more about how our data is organized?

Visit the GDC Data Model Page

GDC and AI

Learn about how GDC data is being used in AI models

GDC and Artificial Intelligence (AI)

About the Data

National Cancer Institute

at the National Institutes of Health