GDC Data Model

The GDC Data Model is the central method of organization of all data artifacts ingested by the GDC.

The data model is designed to maintain data and metadata consistency, integrity, and availability while accommodating:

  • Biospecimen, clinical, and cancer genomic data and metadata
  • Multiple, disparate NCI ongoing projects
  • Completely new, as yet unthought of projects
  • Ongoing changes and technological progress
  • Frequent and complex queries from both external users and internal administrators

To meet these requirements, the design and implementation of the data model leverages:

  • Flexible but robust graph-oriented data stores
  • Indexed document stores for API and front end performance
  • Ontology-based concept and data element definition
  • Schema-based entity and relationship validation on loading

GDC Data Model components are further described in subsequent chapters.