Biospecimen Data Standardization

Biospecimen data refers to information associated with the physical sample taken from a participant and its processing down to the aliquot level for sequencing experiments. This data falls into several key categories:

Standard Identifiers: project-unique identifiers and universally unique identifiers (UUIDs) that enable cases and samples to be referenced and linked to associated clinical and analytical data
Provenance: metadata that indicates the upstream sources of the sample (research program, research project, and donor individual) as well as the downstream products of sample processing (e.g., extracted DNA or RNA analyte)
Quality Control: metadata that express the values of quality control tests performed on biospecimens and analyzed products (e.g., percent tumor nuclei, RIN values, A260/A240 values)

For major NCI CCG programs, biospecimen data is provided by a Biospecimen Core Resource (BCR) under contract to the NCI. Data is submitted in an established, schema-valid XML format. This data includes program and project identifiers, UUIDs, and the relationships between case, sample, and aliquot. UUIDs submitted by BCRs are typically adopted by the GDC.

For other submitters, data in the BCR XML format is accepted. However, the GDC also provides a simpler means for submission of a minimal set of biospecimen data, in which a data may be formatted in a JSON or tab-delimited (TSV) text file and submitted to the GDC Submission Portal.

The GDC Data Model uses a graph representation that has no technical limits on adjusting the entities and relationships. However there may be effects on quality control, reporting, accounting and user interface/experience. Therefore, major changes to the model needed to support new biospecimen information will undergo review by the GDC Data Model Change Control Board.

Submitting Biospecimen Entities

Links to the dictionary entry for each biospecimen entity are listed below. Each entry contains information about each field and a downloadable template for submission.

Biospecimen Entity Field Information

Term	Category	CDE	Required?
consent type	Case	---	Optional
days to consent	Case	---	Optional
days to lost to followup	Case	6154721	Optional
disease type	Case	6161017	Required
index date	Case	6154722	Optional
lost to followup	Case	6161018	Optional
primary site	Case	6161019	Required
biospecimen anatomic site	Sample	4742851	Optional
biospecimen laterality	Sample	8028966	Optional
catalog reference	Sample	---	Optional
current weight	Sample	5432606	Optional
days to collection	Sample	3008340	Optional
days to sample procurement	Sample	---	Optional
diagnosis pathologically confirmed	Sample	15115494	Optional
distance normal to tumor	Sample	3088708	Optional
distributor reference	Sample	---	Optional
freezing method	Sample	5432607	Optional
growth rate	Sample	---	Optional
initial weight	Sample	5432605	Optional
intermediate dimension	Sample	---	Optional
longest dimension	Sample	5432602	Optional
method of sample procurement	Sample	---	Optional
passage count	Sample	---	Optional
pathology report uuid	Sample	---	Optional
preservation method	Sample	5432521	Required
sample ordinal	Sample	---	Optional
sample type	Sample	3111302	Optional
shortest dimension	Sample	5432603	Optional
specimen type	Sample	C70713	Required
time between clamping and freezing	Sample	5432611	Optional
time between excision and freezing	Sample	5432612	Optional
tissue collection type	Sample	---	Optional
tissue type	Sample	5432687	Required
tumor code id	Sample	---	Optional
tumor descriptor	Sample	3288124	Required
creation datetime	Portion	5432592	Optional
is ffpe	Portion	4170557	Optional
portion number	Portion	5432711	Optional
weight	Portion	5432593	Optional
a260 a280 ratio	Analyte	15113234	Optional
amount	Analyte	---	Optional
analyte quantity	Analyte	---	Optional
analyte type	Analyte	15063661	Required
analyte volume	Analyte	15248383	Optional
concentration	Analyte	5432594	Optional
dna integrity number	Analyte	C183240	Optional
experimental protocol type	Analyte	---	Optional
normal tumor genotype snp match	Analyte	4588156	Optional
ribosomal rna 28s 16s ratio	Analyte	---	Optional
ribosomal rna 28s 18s ratio	Analyte	---	Optional
rna integrity number	Analyte	C63637	Optional
spectrophotometer method	Analyte	3008378	Optional
well number	Analyte	5432613	Optional
aliquot quantity	Aliquot	15745091	Optional
aliquot volume	Aliquot	15745088	Optional
amount	Aliquot	---	Optional
analyte type	Aliquot	15063661	Optional
concentration	Aliquot	5432594	Optional
no matched normal low pass wgs	Aliquot	---	Optional
no matched normal targeted sequencing	Aliquot	---	Optional
no matched normal wgs	Aliquot	---	Optional
no matched normal wxs	Aliquot	---	Optional
selected normal low pass wgs	Aliquot	---	Optional
selected normal targeted sequencing	Aliquot	---	Optional
selected normal wgs	Aliquot	---	Optional
selected normal wxs	Aliquot	---	Optional
source center	Aliquot	---	Optional
adapter name	Read group	---	Optional
adapter sequence	Read group	---	Optional
base caller name	Read group	---	Optional
base caller version	Read group	---	Optional
chipseq antibody	Read group	---	Optional
chipseq target	Read group	---	Optional
days to sequencing	Read group	---	Optional
experiment name	Read group	---	Required
flow cell barcode	Read group	---	Optional
fragment maximum length	Read group	---	Optional
fragment mean length	Read group	---	Optional
fragment minimum length	Read group	---	Optional
fragment standard deviation length	Read group	---	Optional
fragmentation enzyme	Read group	---	Optional
includes spike ins	Read group	---	Optional
instrument model	Read group	5432604	Optional
is paired end	Read group	---	Required
lane number	Read group	---	Optional
library name	Read group	---	Required
library preparation kit catalog number	Read group	---	Optional
library preparation kit name	Read group	---	Optional
library preparation kit vendor	Read group	---	Optional
library preparation kit version	Read group	---	Optional
library selection	Read group	---	Required
library strand	Read group	---	Optional
library strategy	Read group	---	Required
multiplex barcode	Read group	---	Optional
number expect cells	Read group	---	Optional
platform	Read group	---	Required
read group name	Read group	---	Required
read length	Read group	---	Required
rin	Read group	5278775	Optional
sequencing center	Read group	---	Required
sequencing date	Read group	---	Optional
single cell library	Read group	---	Optional
size selection range	Read group	---	Optional
spike ins concentration	Read group	---	Optional
spike ins fasta	Read group	---	Optional
target capture kit	Read group	---	Required
target capture kit catalog number	Read group	---	Optional
target capture kit name	Read group	---	Optional
target capture kit target region	Read group	---	Optional
target capture kit vendor	Read group	---	Optional
target capture kit version	Read group	---	Optional
to trim adapter sequence	Read group	---	Optional
bone marrow malignant cells	Slide	---	Optional
number proliferating cells	Slide	5432636	Optional
percent eosinophil infiltration	Slide	2897700	Optional
percent follicular component	Slide	---	Optional
percent granulocyte infiltration	Slide	2897705	Optional
percent inflam infiltration	Slide	2897695	Optional
percent lymphocyte infiltration	Slide	2897710	Optional
percent monocyte infiltration	Slide	5455535	Optional
percent necrosis	Slide	2841237	Optional
percent neutrophil infiltration	Slide	2841267	Optional
percent normal cells	Slide	2841233	Optional
percent rhabdoid features	Slide	6790120	Optional
percent sarcomatoid features	Slide	2429786	Optional
percent stromal cells	Slide	2841241	Optional
percent tumor cells	Slide	5432686	Optional
percent tumor nuclei	Slide	2841225	Optional
prostatic chips positive count	Slide	---	Optional
prostatic chips total count	Slide	---	Optional
prostatic involvement percent	Slide	---	Optional
section location	Slide	---	Required
tissue microarray coordinates	Slide	---	Optional

See our Data Model

Want to know more about how our data is organized?

Visit the GDC Data Model Page

Biospecimen Data Standardization

Submitting Biospecimen Entities

Biospecimen Entity Field Information

Biospecimen Entity Field Information

National Cancer Institute

at the National Institutes of Health