GDC FAQs
-
How do I submit data into the GDC?
Information on the data submission processes and tools are available on the GDC Data Submission Processes and Tools page. Detailed instructions for submitting data into the GDC are provided in the GDC Data Submission Portal User's Guide. Per GDC Policy, organizations interested in submitting data into the GDC must first apply for data submitter access through the NIH database of Genotypes and Phenotypes (dbGaP).
-
What data types and file formats does the GDC support?
Please visit the GDC Data Types and File Formats for a list of the standard data types supported by the GDC.
-
What reference genome is the GDC harmonized against?
The GDC is harmonized against GRCh38. Please see GDC Data Harmonization for additional information on the GDC pipelines for re-aligning genomic data.
-
How does the GDC generate high level data?
The GDC generates high level data for germline and somatic genotyping, RNA-Seq quantification and structural analysis, SNP Array Genotyping and CNV Calls, and variant annotations. Please visit GDC Data Harmonization for additional information on the GDC high level data generation pipelines.
-
How do I obtain an account to log in to the GDC?
Generally, browsing indexed GDC metadata (such as information about the cases and files contained in the GDC Data Portal) does not require a login.
eRA Commons authentication and dbGaP authorization are required before accessing controlled data, which generally includes individually identifiable information such as low level genomic sequencing data and germline variants.
Controlled-access data users log in to the GDC using their eRA Commons accounts. The GDC then verifies that the user has authorization in dbGaP to access specific controlled datasets.
See Obtaining Access to GDC Data and Resources for more information on data download, and Obtaining Access to Submit Data for information on data submission.
-
Where do I go to report an issue or submit an inquiry about the GDC?
The GDC provides helpdesk support for data submission and other issues. For information on the GDC helpdesk, please visit GDC Support.
-
How do I register my project with the GDC?
Once the project has been registered through dbGaP please contact the GDC Helpdesk for assistance with setting up a new project.
-
What is the recommended tool and protocol for transferring large volumes of data to or from the GDC?
The GDC Data Transfer Tool is recommended for transferring large datasets to or from GDC. For additional details, please visit the GDC Data Transfer Tool User’s Guide.
-
When using the GDC Data Transfer Tool, is it possible to set a bandwidth limit?
The GDC Data Transfer Tool does not offer a setting to limit the bandwidth it uses.
-
Does the GDC Data Transfer Tool use random or sequential read/write? Does the choice of protocol make a difference?
The GDC Data Transfer Tool uses sequential read/write for each file segment that is being transferred. By default, the tool executes multipart transfers, which results in multiple parallel, sequential read or write operations. To turn off multipart transfers, users can set the number of processes to 1.
-
How long do GDC authentication tokens remain valid?
GDC authentication tokens remain valid for 30 days.
-
What steps must be taken in dbGaP before data can be submitted to the GDC?
The study and Subject IDs must be registered in dbGaP. For additional details, please visit: Obtaining Access to Submit Data.
-
How is validation performed on genomic data (BAM files) submitted to the GDC?
The GDC validates genomic data (BAM files) using FASTQC and Picard. For additional details, please visit: GDC Data Harmonization.
-
What is the process for uploading, submitting, and releasing data in the GDC?
Uploaded and validated data is put in a workspace until the user formally submits the data to the GDC. This allows users to interact with the data before submitting. Once the data is submitted, the GDC will process applicable datasets (e.g. harmonize molecular data and generate high level data). After processing has been completed, the data is made publicly available according to GDC Data Sharing Policies. The data becomes accessible through GDC tools (GDC Data Portal, GDC APIs) on open or controlled access basis according to the dbGaP authorization policies associated with the data set. For additional information, please visit: GDC Data Submission Processes and Tools.
-
Why does the GDC have data releases and how often do they happen?
Recurrent data releases allow the GDC to version the data and allow users to reference the GDC version number in publications. GDC currently generates releases as needed, with a release every 2-3 months as a goal.
-
Where can I find more information about the GDC data model?
The GDC employs a hierarchical data model which requires metadata and files to be attached only at particular nodes or points in the hierarchy. If you have questions, please review the GDC Data Model or contact GDC Support.
-
How many bytes are there in a megabyte or gigabyte?
There has been long standing debate about prefixes for multiples of bytes. We have chosen to utilize the standard supported by the International System of Units (SI) where 1 gigabyte (GB) = 109 bytes or 1 megabyte (MB) = 106 bytes. This convention is also supported by the IEEE, EU, NIST, and the International System of Quantities. Where appropriate, we utilize the IEEE 1541 recommendations for binary representation where 10243 bytes = 1 gibibyte (GiB) or 10242 bytes = 1 mebibyte (MiB).
-
What web browsers are supported by the GDC?
The following web browsers are supported for use with the GDC Data Portal, Submission Portal, Website, and Documentation site.
- Most recent supported stable version of Microsoft Edge
- Most recent stable version of Google Chrome
- Most recent stable version of Mozilla Firefox
-
Why do the metadata files I am trying to submit fail to validate?
The GDC Data Submission Portal checks XML, JSON, and TSV metadata files for validity at the time they are submitted. If your files fail to validate, please check the error report and review the GDC Data Dictionary for troubleshooting these errors. Additional information on supported files and formats can be found on the GDC Data Model and File Formats pages, and in the GDC Data Submission Portal User's Guide.