Main Content

GDC Overview

The mission of the GDC is to provide the cancer research community with a repository and computational platform for cancer researchers who need to understand cancer, its clinical progression, and response to therapy.

The NCI Center for Cancer Genomics (CCG) was established to spearhead the NCI's efforts in generating crucial datasets for cataloging alterations seen in human tumors, coordinating data unification and sharing, and supporting the development of analytical tools and computational approaches to enhance the understanding of large-scale, multidimensional data. The CCG backs several major cancer genome research programs including The Cancer Genome Atlas (TCGA), the Therapeutically Applicable Research to Generate Effective Treatments (TARGET) program, the Cancer Genome Characterization Initiative (CGCI), and the Human Cancer Models Initiative (HCMI).

The GDC is a repository and computational platform for cancer researchers who need to understand cancer, its clinical progression, and response to therapy.

Although CCG programs extensively characterized genomic changes in various human cancers; these characterizations were previously housed in separate repositories, in different formats, and with varied data management systems. To streamline these efforts, the NCI launched the Genomic Data Commons (GDC) in June 2016, providing the cancer research community with a unified data service for the receipt, quality control, integration, storage, and redistribution of standardized cancer genomic data sets from NCI programs. In June 2024, GDC 2.0 was introduced, featuring a “cohort-centric” approach that enables researchers to create custom sets of cases and conduct gene- and variant-level data analysis directly within the web-based GDC Data Portal.

Exploring NCI's GDC 2.0

NCI's GDC is an important resource for cancer research. It serves as a platform for sharing and analyzing genomic and clinical data. With its recent update to GDC 2.0, researchers can now conduct analyses more easily using their web browsers.

I worked on the first versions of the GDC portal, and I’m now looking at this completely redesigned one. It’s very nice.

Building Custom Cohorts

Creating custom groups, or cohorts, for analysis is simple with the GDC. Researchers can choose from various clinical, genomic, and other features to find the specific cases they want to study. Once a cohort is built, there are multiple options to save it throughout the data portal.

Interactive Analysis Tools

After forming a cohort, the GDC provides several interactive tools for deeper analysis. Some of these tools include:

  • Clinical Data Analysis: Researchers can create bar charts to explore clinical variables and compare survival rates based on these factors.
  • Gene Expression Clustering: This tool allows users to visualize gene expression patterns through heatmaps and cluster diagrams.
  • Mutation Frequency: Users can identify the most frequently mutated genes connected to specific somatic mutations.
  • OncoMatrix: Researchers can visualize combinations of mutations to discover common co-occurrences or mutual exclusivity. This feature includes stylish updates and many customization options.
  • ProteinPaint: This tool visually displays where mutations occur on proteins, how often they happen, and their potential effects.

Data Download and Tool Integration

Researchers can easily download the data they need, whether it’s specific sequencing files or mutations for their custom cohort. The GDC plans to integrate additional third-party genomics analysis tools in the future, thanks to an updated app-based framework. Handling vast amounts of data—over eight petabytes—while maintaining modern analysis techniques and protecting patient privacy is no small task. The GDC is committed to enhancing data processing and delivery continuously.

Growing Community Engagement

Since January 2019, the GDC has seen a dramatic rise in its user base. The number of unique visitors climbed from 51,000 to more than 90,000 by March 2024. This growth shows how vital the GDC has become in the cancer research community.

The NCI's Genomic Data Commons simplifies the path for researchers to access and analyze cancer genomics data. With intuitive tools and a growing amount of data, the GDC stands as a cornerstone resource for advancing cancer research. To learn more or to get started, visit portal.gdc.cancer.gov.

The GDC was developed through the collaboration of several organizations with valuable contributions from community bioinformatics leaders collectively known as the "GDC Team". For more information on the GDC and other CCG-supported programs, visit the CCG Programs Site.

A generative AI tool was used to create draft content for this page from a video. GDC staff reviewed and modified the content to ensure accuracy.