GDC Policies

GDC policies for open and controlled data access adhere to the National Institutes of Health (NIH) Genomic Data Sharing Policy (GDS) Policy as well as the NCI GDS Policy. The GDC requires that users obtain authorization from the National Center for Biotechnology Information (NCBI) Database of Genotypes and Phenotypes (dbGaP) for accessing controlled data.

Data Access Policies 

NIH Genomic Data Sharing (GDS) Policy

Any user accessing GDC open data must adhere to the NIH Genomic Data Sharing (GDS) Policy which indicates that investigators who download unrestricted-access data from NIH-designated data repositories should:
  • Not attempt to identify individual human research participants from whom the data were obtained.
  • Acknowledge in all oral or written presentations, disclosures, or publications the specific dataset(s) or applicable accession number(s) and the NIH-designated data repositories through which the investigator accessed any data.

Any user requesting access to GDC controlled data must apply for dbGaP authorization, which is granted by an NIH Data Access Committee (DAC). DACs review and approve or disapprove all requests from the research community for data access.  Decisions to grant access are made based on whether the request conforms to the specifications within the NIH Genomic Data Sharing Policy and program specific requirements or procedures (if any). In particular, all uses proposed for controlled-access data must be consistent with the data use limitations for the data set as indicated by the submitting institution and identified on the public website for dbGaP. DACs also review and approve or disapprove all requests for access to dbGaP data for programmatic oversight by NIH employees.

Data Transfer Rate Limitations

The GDC aims to provide the research community at large with access to GDC data. To ensure that all users have sufficient access to GDC data, the GDC must impose connection limits on concurrent usage. Rate limiting will be applied to any IP address that exceeds 250 or more concurrent connections. The GDC reserves the right to lower the rate limit at our discretion and without notice to better serve the research community. Individuals that require 250 or more concurrent connections per IP address should contact GDC Support.

Additional Information

 For more information on data access policies, refer to following links: 

GDC policies for data submission adhere to the National Institutes of Health (NIH) Genomic Data Sharing Policy (GDS) Policy as well as the NCI GDS Policy. The GDC requires that organizations submitting data into the GDC obtain authorization from the National Center for Biotechnology Information (NCBI) Database of Genotypes and Phenotypes (dbGaP).

Data Submission Policies

dbGaP Data Submission Policies

Organizations interested in submitting data to the GDC must first apply for data submission authorization through dbGaP. Data submission through dbGaP requires institutional certification under NIH’s Genomic Data Sharing Policy. For information on dbGaP policies associated with data submission, please refer to dbGaP Data Submission Procedures.

GDC Data Sharing Policies

The GDC promotes data sharing in support of precision medicine in accordance with NIH and NCI policies. GDC policies towards data sharing are as follows:

  • Data Sharing Requirement – By submitting data to the GDC, a submitter indicates understanding and agreement that data will be made available to the scientific community at large, according to the data submitter's Genomic Data Sharing Plan as required. Controlled access data will be made available to members of the community having the appropriate dbGaP Data Use Certification. The GDC will also produce harmonized data (raw and derived) based on the originally submitted data. NOTE: The GDC will not preserve an exact copy of the originally submitted data; however, the GDC will preserve the original reads and quality scores. After data is released, either by submitter request to GDC, or by approval of the Center for Cancer Genomics, harmonized raw and GDC-generated derived data will be made available to the public via the GDC Data Portal and GDC data access tools.
  • Data Pre-Processing Period – For each project, the GDC will afford a pre-processing period of exclusive data access to submitters and their named collaborators. The pre-processing period allows for submitters to perform data cleaning and quality control on initial data, and data revisions prior to data submission and public release into the GDC. The pre-processing period may generally last up to six months from the date of data upload followed by data submission into the GDC.
  • Data Submission Period and Release – Once submitted, data will be processed and validated by the GDC including the generation of derived data for applicable data sets. Submitted data will be released no longer than six months after GDC data processing has been completed. Submitted data will be made available for research in a manner consistent with the dataset’s “data use limitations".
  • Data Redaction – The GDC in general will not remove valid data from community access in response to submitter requests. GDC will remove data access in the following events:
    • Data Management Incident – If any available GDC data is discovered to contain protected health information (PHI) or personally identifiable information (PII), the affected data will be made unavailable as quickly as possible after the GDC becomes aware of the issue, and reported according to the GDC DMI standard operating procedure. Affected submitters will be notified as soon as possible as part of this procedure. Appropriately corrected data may be submitted to replace the affected data.
    • Human Subjects Compliance Issue – If any available GDC data is found to be out of compliance with conditions for data sharing established by the relevant dbGaP Data Access Committee (DAC), the affected data will be suppressed as quickly as possible after the GDC becomes aware of the issue. Affected submitters will be notified as soon as possible. Data that comes back into compliance, as determined by the relevant dbGaP DAC, will be rereleased in a subsequent GDC data release.
    • Erroneous Data – If any available GDC data is discovered to be incorrect, the GDC will in general work with the submitter to revise and release a corrected version. In unusual situations, if it is discovered that genomic data is incorrectly mapped to case or biospecimen data in a way that cannot be resolved by remapping, all affected data may be made indefinitely unavailable. The GDC will attempt to work with the submitter to resolve such issues without removing data if possible.

Data Transfer Rate Limitations

The GDC aims to provide the research community at large with access to GDC data. To ensure that all users have sufficient access to GDC data, the GDC must impose connection limits on concurrent usage. Rate limiting will be applied to any IP address that exceeds 250 or more concurrent connections. The GDC reserves the right to lower the rate limit at our discretion and without notice to better serve the research community. Individuals that require 250 or more concurrent connections per IP address should contact GDC Support.

GDC Website Privacy Policy

Protecting your privacy is very important to us. Our GDC Web site links to other National Institutes of Health (NIH) sites, federal agency sites and occasionally, to private organizations. Once you leave the primary GDC Web site, you are subject to the privacy policy for the site(s) you are visiting. We do not collect any personally identifiable information (PII) about you during your visit to the GDC Web site unless you choose to provide it to us. We do, however, collect some data about your visit to our GDC Web site to help us better understand how the public uses the site and how to make it more helpful. We collect information from visitors who read, browse, and/or download information from our Web site. The GDC never collects information for commercial marketing or any purpose unrelated to the NIH mission and goals.

When visitors send a support request containing personal information to the GDC Support email at supportatnci-gdc.datacommons.io, the GDC maintains the request in the GDC Help Desk System. Only designated GDC Team Members requiring access to the support requests in order to assist visitors may view this information.

Types of Information Collected

Information Collected when Browsing the GDC

When you browse through any Web site, certain information about your visit can be collected. We automatically collect and temporarily store the following types of information about your visit:

  • Domain from which you access the Internet
  • IP address (an IP address is a number that is automatically assigned to a computer when surfing the Web)
  • Operating system and information about the browser used when visiting the site
  • Date and time of your visit
  • Pages you visited
  • Address of the Web site that connected you to the GDC (such as google.com or bing.com)
  • Demographic and interest data
  • eRA Commons ID

We use this information to measure the number of visitors to our site and its various sections and to help make our site more useful to visitors. This information cannot be used to identify you as an individual.

Information Collected when Submitting a Support Request

When you submit a support request through the GDC Support email at supportatnci-gdc.datacommons.io, we collect the following types of information:

  • Name
  • E-mail address
  • Phone number (if provided)
  • Inquiry type
  • Inquiry description
  • Any files uploaded in support of the inquiry
  • Date and time the inquiry was submitted

How the GDC Collects Information

The GDC uses several tools to collect the information listed above in Information Collected when Browsing the GDC. No Personally Identifiable Information (PII) is collected. This data is used to monitor the health and growth of the system and comply with security and auditing best practices. The GDC Team conducts analyses and generates reports with this information, which are shared only with GDC Team Members, NIH Senior Staff, and members of the NIH Communications Team who require this information to perform their duties.

The GDC uses the GDC Support email at supportatnci-gdc.datacommons.io to collect the information in the bulleted list in the Information Collected when Submitting a Support Request section. Information collected is maintained in the GDC Help Desk System. The GDC uses the information to provide users with assistance and improve the GDC. These support facilities require the collection of PII so that GDC Team Members can correspond directly with users to provide assistance.

The GDC retains the data from web analytics reporting tools and support requests as long as needed to support the mission of the GDC.

How the GDC Uses Cookies

The Office of Management and Budget Memo M-10-22, Guidance for Online Use of Web Measurement and Customization Technologies allows Federal agencies to use session and persistent cookies.

When you visit any Web site, its server may generate a piece of text known as a "cookie" to place on your computer. The cookie allows the server to "remember" specific information about your visit while you are connected.

The cookie makes it easier for you to use the dynamic features of Web pages. Cookies from GDC Web pages only collect information about your browser’s visit to the site; they do not collect personal information about you.

There are two types of cookies, single session (temporary), and multi-session (persistent). Session cookies last only as long as your Web browser is open. Once you close your browser, the cookie disappears. Persistent cookies are stored on your computer for longer periods.

Session Cookies

We use session cookies for technical purposes such as to enable better navigation through our site. These cookies let our server know that you are continuing a visit to our site. The OMB Memo 10-22 Guidance defines our use of session cookies as "Usage Tier 1 — Single Session.” The policy says, "This tier encompasses any use of single session web measurement and customization technologies."

Persistent Cookies

We use persistent cookies to enable web analytics reporting tools to differentiate between new and returning GDC visitors. Persistent cookies remain on your computer between visits to the GDC until they expire. The OMB Memo 10-22 Guidance defines our use of persistent cookies as "Usage Tier 2 — Multi-session without Personally Identifiable Information (PII).” The policy says, "This tier encompasses any use of multi-session Web measurement and customization technologies when no PII is collected."

How to Opt Out to Disable Cookies

If you do not wish to have session or persistent cookies placed on your computer, you can disable them using your Web browser. If you opt out of cookies, you will still have access to all information and resources at the GDC. Instructions for disabling or opting out of cookies in the most popular browsers are located at http://www.usa.gov/optout_instructions.shtml. Please note that by following the instructions to opt-out of cookies, you will disable cookies from all sources, not just those from the GDC.

How Personal Information is Protected

You do not have to give us personal information to visit the GDC. However, if you choose to submit support requests, we collect your email address to allow us to respond to your request. If you choose to provide us with personally identifiable information, that is, information that is personal in nature and which may be used to identify you, through an e-mail message or electronic form, we will maintain the information you provide only as long as needed. If we store your personal information in a record system designed to retrieve information about you by personal identifier (name, personal email address, personal or mobile phone number, etc.), so that we may contact you, we will safeguard the information you provide to us in accordance with the Privacy Act of 1974, as amended (5 U.S.C. Section 552a). If the GDC operates a record system designed to retrieve information about you in order to accomplish its mission, a Privacy Act Notification Statement should be prominently and conspicuously displayed on the public-facing website or form which asks you to provide personally identifiable information. The notice must address the following five criteria:

  • Legal authorization to collect information about you
  • Purpose of the information collection
  • Routine uses for disclosure of information outside of the GDC
  • Whether the request made of you is voluntary or mandatory under law
  • Effects of non-disclosure if you choose to not provide the requested information

For further information about the GDC privacy policy, please contact GDC Support at supportatnci-gdc.datacommons.io.

Data Safeguarding and Privacy

The GDC uses web measurement and customization technologies to help our Web sites function better for visitors and to better understand how the public uses the online resources we provide. All uses of web-based technologies comply with existing policies with respect to privacy and data safeguarding standards. Information Technology (IT) systems owned and operated by the GDC are assessed using Privacy Impact Assessments (PIAs) posted for public view on the Department of Health and Human Services (DHHS) Web site at http://www.hhs.gov/pia/. NIH conducts and publishes a PIA for each use of a third-party website and application (TPWA) as they may have a different functionality or practice. TPWA PIAs are posted for public view on the DHHS Web site at http://www.hhs.gov/pia/#Third-Party.

Groups of records that contain information about an individual and are designed to be retrieved by the individual’s name or other personal identifier linked to the individual are covered by the Privacy Act of 1974, as amended (5 U.S.C. Section 552a). For these records, NIH Systems of Record Notices are published in the Federal Register and posted on the NIH Senior Official for Privacy Website. When you visit the NIH Institute/Center sites, please look for the Privacy Notice posted on the main pages. When web measurement and customization technologies are used, the Privacy Policy/Notice must provide:

  • Purpose of the web measurement and/or customization technology
  • Usage tier, session type, and technology used
  • Nature of the information collected
  • Purpose and use of the information
  • Whether and to whom the information will be disclosed
  • Privacy safeguards applied to the information
  • Data retention policy for the information
  • Whether the technology is enabled by default or not and why
  • How to opt-out of the web measurement/customization technology
  • Statement that opting-out still permits users to access comparable information or services
  • Identities of all third-party vendors involved in the measurement and customization process

Data Retention and Access Limits

The GDC will retain data collected using the following technologies long enough to achieve the specified objective for which they were collected. The data generated from these activities falls under the National Archives and Records Administration (NARA) General Records Schedule (GRS) 20-item IC 'Electronic Records, and will be handled per the requirements of that schedule.

How the GDC uses Third-Party Web sites and Applications

As part of the OMB Memo M-10-06, Open Government Directive, the GDC uses a variety of new technologies and social media options to communicate and interact with citizens. These sites and applications include popular social networking and media sites, open source software communities and more. TPWAs are Web-based technologies that are not exclusively operated or controlled by the GDC, such as applications not hosted on a.gov domain or those that are embedded on GDC Web pages. Users of TPWAs often share information with the general public, user community, and/or the third-party operating the Web site. These actors may use this information in a variety of ways. TPWAs could cause PII to become available or accessible to the GDC and the public, regardless of whether the information is explicitly solicited or collected by NIH.

The following list includes some of the TPWAs we use and their purpose. The GDC sometimes collects and uses PII made available through third-party Web sites. However, we do not share PII made available through third-party Web sites. Your activity on the third-party Web sites we use is governed by the security and privacy policy of those sites, which we have linked below. You should review the third-party privacy policies before using the sites and ensure that you understand how your information may be used. If you have an account with a third-party Web site, and choose to follow, like, friend, or comment, certain PII associated with your account may be made available to the GDC based on the privacy policy of the third-party Web site and your privacy settings within that third-party Web site. Therefore, you should also adjust privacy settings on your account to match your preferences.

For any GDC TPWA that collects PII, the list below also includes details on the information the GDC collects and how we will protect your private information.

Third-Party Web Sites and Applications

Bit.ly - The GDC uses Bit.ly to shorten long URLs for use in email messages, Twitter feeds and on Facebook pages. Bit.ly collects and provides data on how often you as an email recipient or Facebook/Twitter user, click on the shortened URLs distributed by NIH staff. Bit.ly analytics show how many people clicked on the URLs posted by NIH, compared to the total number of clicks on the shortened URLs. Bit.ly analytics do not provide any PII about the visitors who open the shortened links. The Bit.ly Privacy Policy is available at http://bit.ly/pages/privacy.

Twitter - The GDC uses Twitter to send short messages or ‘Tweets’ (up to 140 characters) to share information about the GDC with you. While you may read the GDC Twitter feeds without subscribing to them, if you want to subscribe to (or follow) GDC Twitter feeds, you must create a Twitter account at www.twitter.com. To create an account, you must provide some personal information, such as your name, user name, password and email address. You have the option to provide additional personal information including a short biography, location or a picture. Most information you provide for a Twitter account is available to the public, but you can modify how much of your information is visible by changing your privacy settings at the Twitter.com Web site. GDC Team Members monitor the number of subscribers and view comments and queries via Twitter, but the team members never takes possession of the personal information belonging to you as a Twitter follower. However, as a practice, comment moderator policy requires the removal from the GDC Twitter pages of any comments that contain spam or are improper, inflammatory, or offensive. The information is then saved on a password-protected shared drive accessible to GDC Managers, System Owners, Communications Staff, Web Teams, and other designated staff who require this information to perform their duties. The Twitter Privacy Policy is available at http://twitter.com/privacy.

YouTube - The GDC uses YouTube to host informational videos on the GDC. NIH conducts and publishes a Privacy Impact Assessment (PIA) for each use of a third-party website as they may have a different functionality or practice. To learn more, visit the published PIAs at http://www.hhs.gov/pia/#Third-Party.

For more information on the uses of social and new media for which GSA has negotiated a federally-friendly Terms of Service Agreement, visit DigitalGov at http://www.digitalgov.gov/resources/negotiated-terms-of-service-agreements/.

For further information about the GDC privacy policy, please contact GDC Support at: supportatnci-gdc.datacommons.io. For information about the NIH privacy policy, please contact the NIH Senior Official for Privacy at privacyatmail.nih.gov (link sends e-mail); call 301-451-3426 or visit http://oma.od.nih.gov/ms/privacy.