Data Submission Policies
dbGaP Data Submission Policies
Organizations interested in submitting data to the GDC must first apply for data submission authorization through dbGaP. Data submission through dbGaP requires institutional certification under NIH’s Genomic Data Sharing Policy. For information on dbGaP policies associated with data submission, please refer to dbGaP Data Submission Procedures.
The GDC will not accept any data for patients age 90 and over including any follow-up events in which the event occurs after a patient turns 90 to ensure that HIPAA compliance is maintained.
GDC Data Sharing Policies
The GDC promotes data sharing in support of precision medicine in accordance with NIH and NCI policies. GDC policies towards data sharing are as follows:
- Data Sharing Requirement – By submitting data to the GDC, a submitter indicates understanding and agreement that data will be made available to the scientific community at large, according to the data submitter's Genomic Data Sharing Plan, as required. The GDC makes somatic mutations of the coding regions open access. Controlled access data will be made available to members of the community having the appropriate dbGaP Data Use Certification. The GDC will also produce harmonized data (raw and derived) based on the originally submitted data including open access somatic variants. NOTE: The GDC will not preserve an exact copy of the originally submitted data; however, the GDC will preserve the original reads and quality scores. After data is released, either by submitter request to GDC, or by approval of the Center for Cancer Genomics, harmonized raw and GDC-generated derived data will be made available to the public via the GDC Data Portal and GDC data access tools. Whenever possible, GDC generated open access variants are made available for download and analysis through GDC data analysis, visualization, and exploration tools with exceptions having to do with informed consent restrictions.
- Data Pre-Processing Period – For each project, the GDC will afford a pre-processing period of exclusive data access to submitters and their named collaborators. The pre-processing period allows for submitters to perform data cleaning and quality control on initial data, and data revisions prior to data submission and public release into the GDC. The pre-processing period may generally last up to six months from the date of data upload followed by data submission into the GDC.
- Data Submission Period and Release – Once submitted, data will be processed and validated by the GDC including the generation of derived data for applicable data sets. Submitted data will be released no longer than six months after GDC data processing has been completed. Submitted data will be made available for research in a manner consistent with the dataset’s “data use limitations".
- Data Redaction – The GDC in general will not remove valid data from community access in response to submitter requests. GDC will remove data access in the following events:
- Data Management Incident – If any available GDC data is discovered to contain protected health information (PHI) or personally identifiable information (PII), the affected data will be made unavailable as quickly as possible after the GDC becomes aware of the issue, and reported according to the GDC DMI standard operating procedure. Affected submitters will be notified as soon as possible as part of this procedure. Appropriately corrected data may be submitted to replace the affected data.
- Human Subjects Compliance Issue – If any available GDC data is found to be out of compliance with conditions for data sharing established by the relevant dbGaP Data Access Committee (DAC), the affected data will be suppressed as quickly as possible after the GDC becomes aware of the issue. Affected submitters will be notified as soon as possible. Data that comes back into compliance, as determined by the relevant dbGaP DAC, will be rereleased in a subsequent GDC data release.
- Erroneous Data – If any available GDC data is discovered to be incorrect, the GDC will in general work with the submitter to revise and release a corrected version. In unusual situations, if it is discovered that genomic data is incorrectly mapped to case or biospecimen data in a way that cannot be resolved by remapping, all affected data may be made indefinitely unavailable. The GDC will attempt to work with the submitter to resolve such issues without removing data if possible.
Data Transfer Rate Limitations
The GDC aims to provide the research community at large with access to GDC data. To ensure that all users have sufficient access to GDC data, the GDC must impose connection limits on concurrent usage. Rate limiting will be applied to any IP address that exceeds 250 or more concurrent connections. The GDC reserves the right to lower the rate limit at our discretion and without notice to better serve the research community. Individuals that require 250 or more concurrent connections per IP address should contact GDC Support.