Main Content

Access Data

Accessing Genomic Data

The GDC Data Portal is a groundbreaking tool that enables a better understanding of cancer biology by allowing researchers to:

  • Search and query genomic data
  • Download data directly from the web browser or download large volumes of data using the high performance GDC Data Transfer Tool
  • Analyze cancer data including clinical information, genomic characterization data, and high level sequence analysis of the tumor genomes

Get Started by Exploring GDC Processes and Tools »

Data Access Tools

The GDC provides web-based tools and API endpoints for searching, viewing and downloading data as well as client tools for downloading large volumes of data.

Controlled Data Access Policy

Any user requesting access to GDC controlled data must apply for access to the data through the database of Genotypes and Phenotypes (dbGaP):

High Quality Datasets

The GDC obtains datasets from NCI programs which maintain tissue collection strategies that couple quantity with quality. Data validation is performed on all data submitted to the GDC.

What’s New with the GDC and Cancer Research

Cancer Research Highlights and Publications:

From the GDC FAQ

Why is the data maintained in cBioPortal, Broad Firehose, or the Seven Bridges Cancer Genomics Cloud different from the GDC data?

The GDC harmonizes data across projects. This includes aligning the genomic data to a common reference genome (HG38) and generating higher level data using GDC bioinformatics pipelines. Other repositories may process the data differently.

For example, TCGA data in cBioPortal uses the original mutation data generated by the individual TCGA sequencing centers. The source of the data is the Broad Firehose (or the publication pages for data that matches a specific manuscript). These data are usually a combination of two mutation callers, but they differ by center (typically a variant caller like MuTect plus an indel caller), and sequencing centers have modified their mutation calling pipelines over time. TCGA data in the GDC is harmonized with the latest reference genome (GRCh38). Mutations are called using four variant callers: MuTect, VarScan2, MuSE, and Pindel.

Need Assistance?

Need help with data retrieval, download, or submission?

Visit the GDC Support Page