Data Access Processes and Tools

Data in the GDC can be accessed through the user‑friendly web‑based GDC Data Portal, which enables browsing, querying and downloading of data and metadata. In addition, the GDC provides a command-line tool for downloading large volumes of data, and an application programming interface (API) for programmatic access to GDC functionality.

Open and Controlled Access Data

The NIH promotes broad and responsible sharing of genomic research data and respects the privacy and intentions of research participants.

Some data in the GDC is open access, which means that no authentication or authorization is necessary to access it. Other data is controlled access, which means that dbGaP authorization and eRA Commons authentication are necessary for access. Whether a dataset is open or controlled is determined according to Data Access Policies in a process that is driven by informed consent of research participants.

Open access data generally includes high level genomic data that is not individually identifiable, as well as most clinical and all biospecimen data elements.

Controlled data generally includes individually identifiable data such as low level genomic sequencing data, germline variants, SNP6 genotype data, and certain clinical data elements. Access to controlled data is granted by program-specific Data Access Committees. See Obtaining Access to Controlled Data for details.

GDC Access buttons
"lock pictures"
Open Data

Open Access Data

No login required for access
Controlled Access Data

Controlled Access Data

Authorization required for access

Data Access Process

The GDC Data Portal provides a web-based facility for users to browse, query, and download data. To download controlled access data, users must login to eRA Commons and have access to the data through dbGaP. No login is required when accessing open access data. From the GDC Data Portal, users can query for the data and add files to the cart for download. For low volumes of metadata and data, users can download the data directly from the GDC Data Portal. 

For large, high volume data, users can download the data using the GDC Data Transfer Tool, which is a client-based tool designed for efficient data transfer. To download multiple files at once with the Data Transfer Tool, the user can create and download a manifest within the GDC Data Portal. To download controlled access data the user can download a token from the GDC Data Portal. A GDC Application Programming Interface (API) is also available to download data programmatically.

Image illustrating the Data Access Processes and Tools which first involves navigating to the web-based GDC Data Portal. To access controlled data, users must login to eRA Commons and obtain access to data sets through dbGaP. The process steps involve: Step 1: Add files to cart, Step 2: Cart provides stats and total download volume, Step 3: Download low volumes of metadata and data, Step 4: Generate token and manifest file for downloading controlled data through the GDC Data Transfer Tool and the GDC API.
 

Data Access Tool Comparison

Data Access Tools GDC Data Portal Tool icon
GDC Data Portal
(Web-Based)
GDC Data Submission Client Tool icon
GDC Data Transfer Tool (Client-Based)
GDC API icon
GDC API (Programmatic)
Search data using predetermined filters called “facets” Tools Checkmark   Tools Checkmark
Query data using smart search advanced query language Tools Checkmark    
Analyze advanced data visualizations Tools Checkmark    
Requires dbGaP account to browse & download Controlled Access Data Tools Checkmark Tools Checkmark Tools Checkmark
Download SMALL volumes of data Tools Checkmark Tools Checkmark Tools Checkmark
Download LARGE volumes of data   Tools Checkmark Tools Checkmark
  User Guide   |   
Launch Portal
User Guide   |    Download Client User Guide   |    Access API