GDC Data Model Management

The GDC has designed the data model to be able to accommodate changes in structure and content with minimal required changes to underlying datastore configuration or to existing content. In particular, using a graph oriented design allows additions of items, relationships, and properties that leave preexisting queries in the codebase and external workflows more or less unaffected. At the same time however, a data management system that is easily modified can also more easily grow unsystematically, creating a risk that the existence of certain items in the database, and means for finding them, can be unknown to all stakeholders except for those who made the modifications. Modifications to a graph may also unintentionally remove or change links to items, creating "orphans" that no longer can be retrieved through existing queries. The risk of creating unmanageable datastores can be mitigated by establishing processes of technical communication and change control. There are three main groups for this purpose at GDC.

Data Model Working Group

The Data Model Working Group is a small group of GDC and Leidos bioinformaticists and engineers with PO representation. It meets weekly to work through design and import questions and issues. In the design phases, this group has actively made design and implementation decisions. Those decisions are recorded in GDC Data Model Working Group meeting notes.

Data Model Change Control Board (CCB)

The Data Model Change Control Board (CCB) is a group that meets at need to formally review requests to change key data model configuration items, and establishes consensus among GDC stakeholders to approve or deny these requests. CCB is responsible for changes to the following configuration items:

  • The addition, modification, or deletion of data model node and edge classes, and their allowable relationships and semantic content;

  • The graph RDBMS schema;

  • The property graph schema (instantiated in gdcdatamodel); as well as

  • GDC dictionary JSON documents.

Technical implementation details of design changes approved by the CCB are generally out of the CCB scope of discussion. Implementation of data model changes is handled by the GDC SDLC process. Change requests may be submitted by any stakeholder.

Data Model Advisory Group

The Data Model Advisory Group includes external NCI and other CCG-invited experts in cancer clinical and genomic data organization and management, high level CCG representatives, and GDC and Leidos management, bioinformaticists and engineers. This group meets at need to provide an external perspective on the ongoing development of the data model, and advises on strategic issues of design that will enable the GDC data model to

  • Expand to support new NCI genomics projects;

  • Develop and increase interoperability among peer level databases and key GDC collaborators; and

  • Maintain semantic consistency to maximize data usability in clinical and epidemiological studies.