Biospecimen Data Standardization

Biospecimen data refers to information associated with the physical sample taken from a participant and its processing down to the aliquot level for sequencing experiments. This data falls into several key categories:

  • Standard Identifiers: project-unique identifiers and universally unique identifiers (UUIDs) that enable cases and samples to be referenced and linked to associated clinical and analytical data
  • Provenance: metadata that indicates the upstream sources of the sample (research program, research project, and donor individual) as well as the downstream products of sample processing (e.g., extracted DNA or RNA analyte)
  • Quality Control: metadata that express the values of quality control tests performed on biospecimens and analyzed products (e.g., percent tumor nuclei, RIN values, A260/A240 values)

For major NCI CCG programs, biospecimen data is provided by a Biospecimen Core Resource (BCR) under contract to the NCI. Data is submitted in an established, schema-valid XML format. This data includes program and project identifiers, UUIDs, and the relationships between case, sample, and aliquot. UUIDs submitted by BCRs are typically adopted by the GDC.

For other submitters, data in the BCR XML format is accepted. However, the GDC also provides a simpler means for submission of a minimal set of biospecimen data, in which a data may be formatted in a JSON or tab-delimited (TSV) text file and submitted to the GDC Submission Portal.

The GDC Data Model uses a graph representation that has no technical limits on adjusting the entities and relationships. However there may be effects on quality control, reporting, accounting and user interface/experience. Therefore, major changes to the model needed to support new biospecimen information will undergo review by the GDC Data Model Change Control Board.

Submitting Biospecimen Entities

Links to the dictionary entry for each biospecimen entity are listed below. Each entry contains information about each field and a downloadable template for submission.

Biospecimen Entity Field Information

Term Category CDE Required?
a260 a280 ratio Analyte 5432595 Optional
adapter name Read group --- Optional
adapter sequence Read group --- Optional
aliquot quantity Aliquot --- Optional
aliquot volume Aliquot --- Optional
amount Aliquot --- Optional
amount Analyte --- Optional
analyte quantity Analyte --- Optional
analyte type id Aliquot 5432508 Optional
analyte type id Analyte 5432508 Optional
analyte type Aliquot 2513915 Optional
analyte type Analyte 2513915 Required
analyte volume Analyte --- Optional
base caller name Read group --- Optional
base caller version Read group --- Optional
biospecimen anatomic site Sample 4742851 Optional
biospecimen laterality Sample 2007875 Optional
bone marrow malignant cells Slide --- Optional
catalog reference Sample --- Optional
chipseq antibody Read group --- Optional
chipseq target Read group --- Optional
composition Sample 5432591 Optional
concentration Aliquot 5432594 Optional
concentration Analyte 5432594 Optional
consent type Case --- Optional
creation datetime Portion 5432592 Optional
current weight Sample 5432606 Optional
days to collection Sample 3008340 Optional
days to consent Case --- Optional
days to lost to followup Case 6154721 Optional
days to sample procurement Sample --- Optional
days to sequencing Read group --- Optional
diagnosis pathologically confirmed Sample --- Optional
disease type Case 6161017 Optional
distance normal to tumor Sample 3088708 Optional
distributor reference Sample --- Optional
experiment name Read group --- Required
experimental protocol type Analyte --- Optional
flow cell barcode Read group --- Optional
fragment maximum length Read group --- Optional
fragment mean length Read group --- Optional
fragment minimum length Read group --- Optional
fragment standard deviation length Read group --- Optional
fragmentation enzyme Read group --- Optional
freezing method Sample 5432607 Optional
growth rate Sample --- Optional
includes spike ins Read group --- Optional
index date Case 6154722 Optional
initial weight Sample 5432605 Optional
instrument model Read group 5432604 Optional
intermediate dimension Sample --- Optional
is ffpe Portion 4170557 Optional
is ffpe Sample 4170557 Optional
is paired end Read group --- Required
lane number Read group --- Optional
library name Read group --- Required
library preparation kit catalog number Read group --- Optional
library preparation kit name Read group --- Optional
library preparation kit vendor Read group --- Optional
library preparation kit version Read group --- Optional
library selection Read group --- Required
library strand Read group --- Optional
library strategy Read group --- Required
longest dimension Sample 5432602 Optional
lost to followup Case 6161018 Optional
method of sample procurement Sample --- Optional
multiplex barcode Read group --- Optional
no matched normal low pass wgs Aliquot --- Optional
no matched normal targeted sequencing Aliquot --- Optional
no matched normal wgs Aliquot --- Optional
no matched normal wxs Aliquot --- Optional
normal tumor genotype snp match Analyte 4588156 Optional
number expect cells Read group --- Optional
number proliferating cells Slide 5432636 Optional
oct embedded Sample 5432538 Optional
passage count Sample --- Optional
pathology report uuid Sample --- Optional
percent eosinophil infiltration Slide 2897700 Optional
percent follicular component Slide --- Optional
percent granulocyte infiltration Slide 2897705 Optional
percent inflam infiltration Slide 2897695 Optional
percent lymphocyte infiltration Slide 2897710 Optional
percent monocyte infiltration Slide 5455535 Optional
percent necrosis Slide 2841237 Optional
percent neutrophil infiltration Slide 2841267 Optional
percent normal cells Slide 2841233 Optional
percent rhabdoid features Slide 6790120 Optional
percent sarcomatoid features Slide 2429786 Optional
percent stromal cells Slide 2841241 Optional
percent tumor cells Slide 5432686 Optional
percent tumor nuclei Slide 2841225 Optional
platform Read group --- Required
portion number Portion 5432711 Optional
preservation method Sample 5432521 Required
primary site Case 6161019 Optional
prostatic chips positive count Slide --- Optional
prostatic chips total count Slide --- Optional
prostatic involvement percent Slide --- Optional
read group name Read group --- Required
read length Read group --- Required
ribosomal rna 28s 16s ratio Analyte --- Optional
rin Read group 5278775 Optional
rna integrity number Analyte --- Optional
sample ordinal Sample --- Optional
sample type id Sample --- Optional
sample type Sample 3111302 Optional
section location Slide --- Required
selected normal low pass wgs Aliquot --- Optional
selected normal targeted sequencing Aliquot --- Optional
selected normal wgs Aliquot --- Optional
selected normal wxs Aliquot --- Optional
sequencing center Read group --- Required
sequencing date Read group --- Optional
shortest dimension Sample 5432603 Optional
single cell library Read group --- Optional
size selection range Read group --- Optional
source center Aliquot --- Optional
specimen type Sample C70713 Required
spectrophotometer method Analyte 3008378 Optional
spike ins concentration Read group --- Optional
spike ins fasta Read group --- Optional
target capture kit catalog number Read group --- Optional
target capture kit name Read group --- Optional
target capture kit target region Read group --- Optional
target capture kit vendor Read group --- Optional
target capture kit version Read group --- Optional
target capture kit Read group --- Required
time between clamping and freezing Sample 5432611 Optional
time between excision and freezing Sample 5432612 Optional
tissue collection type Sample --- Optional
tissue microarray coordinates Slide --- Optional
tissue type Sample 5432687 Required
to trim adapter sequence Read group --- Optional
tumor code id Sample --- Optional
tumor code Sample --- Optional
tumor descriptor Sample 3288124 Required
weight Portion 5432593 Optional
well number Analyte 5432613 Optional