Main Content

Access Data

Accessing Genomic Data

The GDC Data Portal is a groundbreaking tool that enables a better understanding of cancer biology by allowing researchers to:

  • Search and query genomic data
  • Download data directly from the web browser or download large volumes of data using the high performance GDC Data Transfer Tool
  • Analyze cancer data including clinical information, genomic characterization data, and high level sequence analysis of the tumor genomes

Get Started by Exploring GDC Processes and Tools »

Data Access Tools

The GDC provides web-based tools and API endpoints for searching, viewing and downloading data as well as client tools for downloading large volumes of data.

Controlled Data Access Policy

Any user requesting access to GDC controlled data must apply for access to the data through the database of Genotypes and Phenotypes (dbGaP):

High Quality Datasets

The GDC obtains datasets from NCI programs which maintain tissue collection strategies that couple quantity with quality. Data validation is performed on all data submitted to the GDC.

What’s New with the GDC and Cancer Research

Cancer Research Highlights and Publications:

From the GDC FAQ

How can I access GDC sequencing data in FASTQ format?

Raw sequencing files submitted to the GDC are processed using GDC Genomic Data Alignment pipelines. The processed data are made available in the GDC Data Portal as BAM files containing aligned reads and unmapped reads (if available). No reads are hard-clipped, but reads that were flagged as "failed" during an Illumina sequencing run are discarded.

Third-party tools such as biobambam2 or Samtools fastq can convert these files to FASTQ sequencing data. Note that DNA-Seq quality scores are modified during the score recalibration co-cleaning step, so third-party tool parameters must be set to retrieve the original scores (biobambam2: tryoq=1; samtools fastq: -O). Because GDC harmonized BAM files may contain multiple read groups, the conversion parameter should be set to retain read group IDs in the generated FASTQ files (biobambam2: outputperreadgroup=1; samtools: samtools split).

Need Assistance?

Need help with data retrieval, download, or submission?

Visit the GDC Support Page