The GDC develops and uses community standards for data elements, and data types and file formats. GDC team members participate in community genomics standards groups such as GA4GH and NIH Commons who are developing standard programmatic interfaces for managing, describing, and annotating genomic data.
- Data Elements - The GDC develops and uses data elements in coordination with the community. Data element properties are assigned a common data element (CDE) created and maintained in the NCI's cancer Data Standards Repository (caDSR) and accessible through the NCI CDE Browser. The CDE describes the data element’s property as well as its data type. The enumerated values associated with the GDC data elements are curated in coordination with NCI’s Enterprise Vocabulary Services (EVS) team and maintained in the NCI Thesaurus (NCIt). GDC data elements are described in the GDC Data Dictionary and queryable via the GDC Data Dictionary Search tool. Additionally the GDC unitizes external data standards such as classifications defined by organizations like the International Classification of Diseases for Oncology (ICD-O) and the American Joint Committee on Cancer (AJCC) staging classifications.
- Data Types and File Formats - The GDC specifies data types and file formats for clinical, biospecimen, and molecular data as described in GDC Data Types and File Formats. GDC clinical and biospecimen data can be submitted according to GDC-specific XML, TSV, or JSON formats. The GDC uses industry standard data formats for molecular sequencing data (e.g. BAM, FASTQ) and variant calls (VCFs).
- Programmatic Interfaces - The GDC Application Programming Interface (API) provides developers with a programmatic interface to query and download GDC data and submit data to the GDC. The GDC API also provides a utility to perform BAM slicing. Through participation in community genomics standards groups such as GA4GH and NIH Commons, the GDC aims to leverage applicable standards as they evolve. The GDC architecture also provides programmatic interfaces to internal data systems such as a digital identifier service for creating and accessing Universally Unique Identifiers (UUID) as well as a metadata service.