Submitted Data Types and File Formats

The following table and figure list entities and data submittable to the GDC.  Links to the data dictionary and templates are also provided.  Not all programs, projects, or cases will have data available for all types.  

Submitted Data Types and File Formats
Entity Category Entity Name File Format File Metadata Template
Administrative Case -- TSVJSON      
Biospecimen Sample -- TSVJSON
  Portion -- TSV, JSON
  Analyte -- TSV, JSON
  Aliquot -- TSV, JSON
  Read Group -- TSV, JSON
  Slide -- TSV, JSON
Clinical Demographic -- TSV, JSON
  Diagnosis -- TSV, JSON
  Exposure -- TSV, JSON
  Family History -- TSV, JSON
  Follow Up -- TSV, JSON
  Molecular Test -- TSV, JSON
  Treatment -- TSV, JSON
Data File Analysis Metadata SRA XML, MAGE-TAB (SDRF, IDF) TSV, JSON
  Biospecimen Supplement BCR XML, GDC-approved spreadsheet TSV, JSON
  Clinical Supplement BCR XML, GDC-approved spreadsheet TSV, JSON
  Experiment Metadata SRA XML TSV, JSON
  Pathology Report PDF TSV, JSON
  Run Metadata SRA XML TSV, JSON
  Slide Image JPEG, SVS, TIFF TSV, JSON
  Submitted Unaligned Reads (Illumina Platform) FASTQ, BAM TSV, JSON
  Submitted Aligned Reads (Illumina Platform) BAM TSV, JSON
  Submitted Genomic Profile --- TSV, JSON

GDC Data Model for Submission

The following figure displays the relationship between the different submittable data model entities. Arrows point to the parent entity, which must be specified before the child entity. The complete GDC Data Model can be viewed here

GDC Data Model - v1.12 - Submission

Format Details

XML

Biospecimen and clinical data submitted in XML format must be valid with respect to the latest Biospecimen Core Resource (BCR) XML Schema. XML submission of biospecimen and clinical is only supported through he GDC API. Molecular sequence metadata submitted in XML format must be valid with respect to NCBI SRA XML Schema version 1.5.

TSV

Tab-separated value (TSV) files are typically submitted via the GDC Data Submission Portal user interface. These may be created in any text editor, or exported from MS Excel by using "Save As" from the File menu and selecting the format "Tab-delimited Text".

A TSV file contains data that correspond to a given entity defined in the GDC Data Dictionary. The file must contain a column for each required property for that entity; for example, see the Demographic entry. Each record in the TSV represents a submissible item described by the entity; for example, each line in a demographic TSV file contains metadata for a single case.

JSON

JSON data submitted as files to the GDC must have a structure that is valid with respect to the GDC Submission API specification. JSON files can be submitted via the GDC Data Submission Portal user interface.