Biospecimen Data Harmonization

Biospecimen data refers to information associated with the physical sample taken from a participant and its processing down to the aliquot level for sequencing experiments. This data falls into several key categories:

  • Standard Identifiers: project-unique identifiers and universally unique identifiers (UUIDs) that enable cases and samples to be referenced and linked to associated clinical and analytical data
  • Provenance: metadata that indicates the upstream sources of the sample (research program, research project, and donor individual) as well as the downstream products of sample processing (e.g., extracted DNA or RNA analyte)
  • Quality Control: metadata that express the values of quality control tests performed on biospecimens and analyzed products (e.g., percent tumor nuclei, RIN values, A260/A240 values)

For major NCI CCG programs, biospecimen data is provided by a Biospecimen Core Resource (BCR) under contract to the NCI. Data is submitted in an established, schema-valid XML format. This data includes program and project identifiers, UUIDs, and the relationships between case, sample, and aliquot. UUIDs submitted by BCRs are typically adopted by the GDC.

For other submitters, data in the BCR XML format is accepted. However, the GDC also provides a simpler means for submission of a minimal set of biospecimen data, in which a data may be formatted in a JSON or tab-delimited (TSV) text file and submitted to the GDC Submission Portal.

The GDC Data Model uses a graph representation that has no technical limits on adjusting the entities and relationships. However there may be effects on quality control, reporting, accounting and user interface/experience. Therefore, major changes to the model needed to support new biospecimen information will undergo review by the GDC Data Model Change Control Board.

Submitting Biospecimen Entities

Links to the dictionary entry for each biospecimen entity are listed below. Each entry contains information about each field and a downloadable template for submission.

Biospecimen Entity Field Information

Term Category CDE Required?
A260/A280 Ratio Analyte 5432595 No
Aliquot Quantity Aliquot --- No
Aliquot Volume Aliquot --- No
Amount Aliquot --- No
Amount Analyte --- No
Analyte Quantity Analyte --- No
Analyte Type ID Aliquot 5432508 No
Analyte Type ID Analyte 5432508 No
Analyte Type Aliquot 2513915 No
Analyte Type Analyte 2513915 Yes
Analyte Volume Analyte --- No
Biospecimen Anatomic Site Sample 4742851 No
Biospecimen Laterality Sample 2007875 No
Bone Marrow Malignant Cells Slide --- No
Catalog Reference Sample --- No
Centers Aliquot --- No
Centers Portion --- No
Composition Sample 5432591 No
Concentration Aliquot 5432594 No
Concentration Analyte 5432594 No
Consent Type Case --- No
Creation Datetime Portion 5432592 No
Current Weight Sample 5432606 No
Days to Collection Sample 3008340 No
Days to Consent Case --- No
Days to Lost to Followup Case 6154721 No
Days to Sample Procurement Sample --- No
Diagnosis Pathologically Confirmed Sample --- No
Disease Type Case 6161017 No
Distance Normal to Tumor Sample 3088708 No
Distributor Reference Sample --- No
Freezing Method Sample 5432607 No
Growth Rate Sample --- No
Index Date Case 6154722 No
Initial Weight Sample 5432605 No
Intermediate Dimension Sample --- No
Is FFPE Sample 4170557 No
Is Ffpe Portion 4170557 No
Longest Dimension Sample 5432602 No
Lost to Followup Case 6161018 No
Method of Sample Procurement Sample --- No
No Matched Normal Low Pass Wgs Aliquot --- No
No Matched Normal Targeted Sequencing Aliquot --- No
No Matched Normal WGS Aliquot --- No
No Matched Normal WXS Aliquot --- No
Normal Tumor Genotype SNP Match Analyte 4588156 No
Number Proliferating Cells Slide 5432636 No
Oct Embedded Sample 5432538 No
Parent Samples Sample --- No
Passage Count Sample --- No
Pathology Report Uuid Sample --- No
Percent Eosinophil Infiltration Slide 2897700 No
Percent Follicular Component Slide --- No
Percent Granulocyte Infiltration Slide 2897705 No
Percent Inflam Infiltration Slide 2897695 No
Percent Lymphocyte Infiltration Slide 2897710 No
Percent Monocyte Infiltration Slide 5455535 No
Percent Necrosis Slide 2841237 No
Percent Neutrophil Infiltration Slide 2841267 No
Percent Normal Cells Slide 2841233 No
Percent Rhabdoid Features Slide 6790120 No
Percent Sarcomatoid Features Slide 2429786 No
Percent Stromal Cells Slide 2841241 No
Percent Tumor Cells Slide 5432686 No
Percent Tumor Nuclei Slide 2841225 No
Portion Number Portion 5432711 No
Preservation Method Sample 5432521 No
Primary Site Case 6161019 No
Prostatic Chips Positive Count Slide --- No
Prostatic Chips Total Count Slide --- No
Prostatic Involvement Percent Slide --- No
Ribosomal Rna 28S 16S Ratio Analyte --- No
Sample Type Id Sample --- No
Sample Type Sample 3111302 Yes
Section Location Slide --- Yes
Selected Normal Low Pass WGS Aliquot --- No
Selected Normal Targeted Sequencing Aliquot --- No
Selected Normal WGS Aliquot --- No
Selected Normal WXS Aliquot --- No
Shortest Dimension Sample 5432603 No
Source Center Aliquot --- No
Spectrophotometer Method Analyte 3008378 No
Time Between Clamping And Freezing Sample 5432611 No
Time Between Excision And Freezing Sample 5432612 No
Tissue Collection Type Sample --- No
Tissue Microarray Coordinates Slide --- No
Tissue Type Sample 5432687 Yes
Tumor Code Id Sample --- No
Tumor Code Sample --- No
Tumor Descriptor Sample 3288124 No
Weight Portion 651 No
Well Number Analyte 5432613 No