Biospecimen Data Harmonization

Biospecimen data refers to information associated with the physical sample taken from a participant and its processing down to the aliquot level for sequencing experiments. This data falls into several key categories:

  • Standard Identifiers: project-unique identifiers and universally unique identifiers (UUIDs) that enable cases and samples to be referenced and linked to associated clinical and analytical data
  • Provenance: metadata that indicates the upstream sources of the sample (research program, research project, and donor individual) as well as the downstream products of sample processing (e.g., extracted DNA or RNA analyte)
  • Quality Control: metadata that express the values of quality control tests performed on biospecimens and analyzed products (e.g., percent tumor nuclei, RIN values, A260/A240 values)

For major NCI CCG programs, biospecimen data is provided by a Biospecimen Core Resource (BCR) under contract to the NCI. Data is submitted in an established, schema-valid XML format. This data includes program and project identifiers, UUIDs, and the relationships between case, sample, and aliquot. UUIDs submitted by BCRs are typically adopted by the GDC.

For other submitters, data in the BCR XML format is accepted. However, the GDC also provides a simpler means for submission of a minimal set of biospecimen data, in which a data may be formatted in a JSON or tab-delimited (TSV) text file and submitted to the GDC Submission Portal.

The GDC Data Model uses a graph representation that has no technical limits on adjusting the entities and relationships. However there may be effects on quality control, reporting, accounting and user interface/experience. Therefore, major changes to the model needed to support new biospecimen information will undergo review by the GDC Data Model Change Control Board.

Submitting Biospecimen Entities

Links to the dictionary entry for each biospecimen entity are listed below. Each entry contains information about each field and a downloadable template for submission.

Biospecimen Entity Field Information

Term Category CDE Required / Preferred / Optional
A260 A280 Ratio Analyte 5432595 Optional
Aliquot Quantity Aliquot --- Optional
Aliquot Volume Aliquot --- Optional
Analyte Quantity Analyte --- Optional
Analyte Type ID Aliquot 5432508 Optional
Analyte Type id Analyte 5432508 Optional
Analyte Type Aliquot 2513915 Optional
Analyte Type Analyte 2513915 Required
Analyte Volume Analyte --- Optional
Biospecimen Anatomic Site Sample 4742851 Optional
Biospecimen Laterality Sample 2007875 Optional
Catalog Reference Sample --- Optional
Composition Sample 5432591 Optional
Concentration Aliquot 5432594 Optional
Concentration Analyte 5432594 Optional
Current Weight Sample 5432606 Optional
Days to Collection Sample 3008340 Optional
Days to Lost to Followup Case 6154721 Optional
Days to Sequencing Read Group --- Optional
Diagnosis Pathologically Confirmed Sample --- Optional
Disease Type Case 6161017 Optional
Distance Normal to Tumor Sample 3088708 Optional
Distributor Reference Sample --- Optional
Fragment Maximum Length Read Group --- Optional
Fragment Mean Length Read Group --- Optional
Fragment Minimum Length Read Group --- Optional
Fragment Standard Deviation Length Read Group --- Optional
Freezing Method Sample 5432607 Optional
Growth Rate Sample --- Optional
Index Date Case 6154722 Optional
Initial Weight Sample 5432605 Optional
Is FFPE Portion 4170557 Optional
Is FFPE Sample 4170557 Optional
Lane Number Read Group --- Optional
Lost to Followup Case 6161018 Optional
Method of Sample Procurement Sample --- Optional
Multiplex Barcode Read Group --- Optional
No Matched Normal Low Pass WGS Aliquot --- Optional
No Matched Normal Targeted Sequencing Aliquot --- Optional
No Matched Normal WXS Aliquot --- Optional
No Matched Normal wgs Aliquot --- Optional
Normal Tumor Genotype SNP Match Analyte 4588156 Optional
Oct Embedded Sample 5432538 Optional
Passage Count Sample --- Optional
Portion Number Portion 5432711 Optional
Preservation Method Sample 5432521 Optional
Primary Site Case 6161019 Optional
RIN Read Group 5278775 Optional
Ribosomal RNA 28S 16S Ratio Analyte --- Optional
Sample Type Sample 3111302 Required
Shortest Dimension Sample 5432603 Optional
Spectrophotometer Method Analyte 3008378 Optional
Time Between Clamping and Freezing Sample 5432611 Optional
Time Between Excision and Freezing Sample 5432612 Optional
Tissue Type Sample 5432687 Required
Tumor Descriptor Sample 3288124 Optional
Weight Portion 5432593 Optional
Well Number Analyte 5432613 Optional