GDC Reference Files

Reference files used by the GDC data harmonization and generation pipelines are provided below. MD5 checksums are provided for verifying file integrity after download. Additional files are also included to allow for reproduction of GDC pipeline analyses.

GRCh38.d1.vd1 Reference Sequence

GRCh38.d1.vd1.fa.tar.gz

  • md5: 3ffbcfe2d05d43206f57f81ebb251dc9
  • file size: 875.3 MB

This reference genome is used by the GDC for all sequencing and array based analyses. This file is composed of the following sequences:


Index Files

Index files are built from the GDC reference genome and are used with the software listed below.

GDC.h38.d1.vd1 BWA Index Files

GDC.h38.d1.vd1 GATK Index Files

GDC.h38.d1.vd1 STAR2 Index Files (v36)

GDC.h38.d1.vd1 STAR2 Index Files (v22)


Annotation Files

Annotation files contain information about the position and identity of regions in the reference genome. They allow software to calculate expression values.

GDC.h38 miRNA database files

GDC.h38 GENCODE v36 GTF

GDC.h38 GENCODE v22 GTF

GDC.h38 GENCODE TSV (v22)


Miscellaneous Files

Methylation Array Gene Annotation File (v36)

Antibody Description Files for TCGA RPPA Data (v36)

Antibody Description Files for TCGA RPPA Data (v22)

Genome Annotation Files for Legacy TCGA Data

SNP6 GRCh38 Remapped Probeset File for Copy Number Variation Analysis

If you are using Masked Copy Number Segment for GISTIC analysis, please only keep probesets with freqcnv = FALSE

SNP6 GRCh38 Liftover Probeset File for Copy Number Variation Analysis

GDC VEP Cache File

GDC Panel of Normal (PON) Files used for Variant Calling

These files are controlled and require dbGaP access to download. You will need to use the gdc-client to download these.

For Tumor-Only Variant Calling Pipeline

gatk4_mutect2_4136_pon.vcf.tar

  • uuid: 6c4c4a48-3589-4fc0-b1fd-ce56e88c06e4
  • md5: 725d891e02ca93edaabac8b09322439e
  • file size: 92 MB
For Tumor / Normal Variant Calling Pipeline

MuTect2.PON.4136.vcf.tar

  • uuid: 6b45b9f7-893e-4947-83b6-db0402471e23
  • md5: d13a138dcf4e9f1ec8a69ac3a4f64ca9
  • file size: 121 MB

MuTect2.PON.5210.vcf.tar

  • uuid: 726e24c0-d2f2-41a8-9435-f85f22e1c832
  • md5: 5b5c1c3e208aa9a403cc4a8ff39e7f1f
  • file size: 146 MB