Main Content

Access Data

Accessing Genomic Data

The GDC Data Portal is a groundbreaking tool that enables a better understanding of cancer biology by allowing researchers to:

  • Search and query genomic data
  • Download data directly from the web browser or download large volumes of data using the high performance GDC Data Transfer Tool
  • Analyze cancer data including clinical information, genomic characterization data, and high level sequence analysis of the tumor genomes

Get Started by Exploring GDC Processes and Tools »

Data Access Tools

The GDC provides web-based tools and API endpoints for searching, viewing and downloading data as well as client tools for downloading large volumes of data.

Controlled Data Access Policy

Any user requesting access to GDC controlled data must apply for access to the data through the database of Genotypes and Phenotypes (dbGaP):

High Quality Datasets

The GDC obtains datasets from NCI programs which maintain tissue collection strategies that couple quantity with quality. Data validation is performed on all data submitted to the GDC.

What’s New with the GDC and Cancer Research

Cancer Research Highlights and Publications:

From the GDC FAQ

Where can I find the target and bait/probe files (BED files) that describe the capture kit used in an exome sequencing experiment?

Capture kit information is provided by the GDC API at the read group level, where available. In some cases, additional information may be available in SRA XML files.

The relevant read_group properties returned by the GDC API are:

  1. target_capture_kit_name
  2. target_capture_kit_catalog_number
  3. target_capture_kit_vendor
  4. target_capture_kit_target_region

The target_capture_kit_target_region field provides a URL for the capture kit target file, distributed by the kit manufacturer or by the research program. Bait/probe files can sometimes be found at the same URL; or a URL to the bait/probe file may be available in the SRA XML file.

Note: Some BAM files include information from multiple read groups, and sometimes read groups produced with different capture kits are included in the same BAM file. Tools are available for splitting BAM files into read groups, e.g. bamutil.

Note: Target and bait/probe files may use an older reference genome, so liftover may be required for certain applications.

Need Assistance?

Need help with data retrieval, download, or submission?

Visit the GDC Support Page