Data Standards

The GDC develops and uses community standards for data elements, and data types and file formats. GDC team members participate in community genomics standards groups such as GA4GH and NIH Commons who are developing standard programmatic interfaces for managing, describing, and annotating genomic data.

  • Data Elements - The GDC develops and uses data elements in coordination with the community. Data elements are maintained in the NCI's cancer Data Standards Repository (caDSR) and accessible through the NCI CDE Browser.  GDC data elements are described in the GDC Data Dictionary.
  • Data Types and File Formats - The GDC specifies data types and file formats for clinical, biospecimen, and molecular data as described in GDC Data Types and File Formats. GDC clinical and biospecimen data can be submitted according to GDC-specific XML, TSV, or JSON formats. The GDC uses industry standard data formats for molecular sequencing data (e.g. BAM, FASTQ) and variant calls (VCFs). 
  • Programmatic Interfaces - The GDC Application Programming Interface (API) provides developers with a programmatic interface to query and download GDC data and submit data to the GDC. The GDC API also provides a utility to perform BAM slicing. Through participation in community genomics standards groups such as GA4GH and NIH Commons, the GDC aims to leverage applicable standards as they evolve. The GDC architecture also provides programmatic interfaces to internal data systems such as a digital identifier service for creating and accessing Universally Unique Identifiers (UUID) as well as a metadata service.