Data Submission Processes and Tools

GDC data submission processes are enabled through a user-friendly web-based tool and programmatic interface for submitting clinical and biospecimen data, as well as experiment metadata.  Large, high volume experiment files can be uploaded using a high performance client-based tool.

Data Submission Processes

Organizations interested in submitting data to the GDC should first review the conditions for requesting data submission and then submit a request for GDC data submission via the GDC Data Submission Request Form. The GDC reviews all data submission requests and notifies the data submitter as to whether the study is approved for data submission into the GDC.

Once approved for data submission into the GDC, the data submitter works with the NCI Genomic Program Administrator (GPA) to register the study and subjects in dbGaP. Study registration includes working with NCI GPA to have the GDC listed as a Trusted Partner, providing the institutional certification, and adding approved data submitters in the dbGaP study registration. Approved data submitters should be limited to only users uploading data and metadata into the GDC. Upon completion of study registration, the data submitter will submit the Subject IDs associated with the study in dbGaP.

Once the study has been registered in dbGaP, the GDC will setup the project in the GDC. Provided the Subject IDs are registered in dbGaP and data submitters have dbGaP submitter privileges, data can then be uploaded and validated within the GDC.  After validation, data is submitted to the GDC for processing.  Once the data is submitted, the GDC will process applicable data sets (see Data Harmonization for additional details).  After data processing has been completed, the user can release their data to the GDC, which must occur six (6) months after GDC data processing, per GDC Data Submission Policies. Data is then made available through GDC Data Access Tools as open or controlled access per dbGaP authorization policies associated with the data set. 

This process can be described in discrete steps as identified in the table below.

# Data Submitter Step Response
1 Complete GDC Data Submission Request Form GDC pre-approves study according to GDC Guidelines and sends confirmation email.
2 Create eRA commons account for any data submitters that do not already have one eRA Commons accounts are created.
3 Contact NCIOfficeofDataSharingatmail.nih.gov (NCI Office of Data Sharing) to identify appropriate GPA and register study. Provide GPA with:
  • GDC pre-approval confirmation email
  • List of data submitters, including their eRA logins
  • Institutional Certification(s)
  • Basic study information
    • Indicate GDC as data sharing platform
GPA inputs information into dbGaP system.
4 Study PI and PI assistant/submitter receive email invitations to the dbGaP Submission Portal that must be accepted within 7 days. (Save email for future reference.)

Submit to dbGaP Submission Portal to begin processing study:

  • Study Configuration File
  • Subject Sample Mapping File (includes subject ids and consent)

Note: Other files required in dbGaP submission package may be blank.

dbGaP processes information and produces PHS Accession Number. (This may take 4-6 weeks to process)
5
Contact the supportatnci-gdc.datacommons.io (GDC helpdesk) to create a project for submission
GDC creates project within GDC submission tools
6 Upload, validate, and submit data to GDC for harmonization GDC harmonizes data
7 Review and release harmonized data GDC releases data

Within dbGaP system (NLM/NCBI)

Within GDC (NCI)

This process can further be illustrated in the diagram below.

Data Submission Processes and Tools  

Data Submission Tools

The GDC provides web-based, client-based, and programmatic tools to guide users through the data submission process. Data submitters can use the web-based GDC Data Submission Portal for submitting small volumes of data and metadata, and the client-based GDC Data Transfer Tool for submitting the large, high volume experiment data. A GDC Application Programming Interface (API) is available to large organizations to submit data programmatically through GDC submission pipelines.

Data Submission Tools GDC Data Portal Tool icon
GDC Data Submission Portal
(Web-Based)
GDC Data Submission Client Tool icon
GDC Data Transfer Tool (Client-Based)
GDC API icon
GDC API (Programmatic)
Requires dbGaP Authorization Tools Checkmark Tools Checkmark Tools Checkmark
Submit Clinical Data Tools Checkmark   Tools Checkmark
Submit Biospecimen Data Tools Checkmark   Tools Checkmark
Submit Experiment Metadata Tools Checkmark   Tools Checkmark
Submit Experiment Files   Tools Checkmark  
Upload Small Volumes of Data Tools Checkmark Tools Checkmark Tools Checkmark
Upload Large Volumes of Data   Tools Checkmark