Reference files used by the GDC data harmonization and generation pipelines are provided below. MD5 checksums are provided for verifying file integrity after download. Additional files are also included to allow for reproduction of GDC pipeline analyses.
This reference genome is used by the GDC for all sequencing and array based analyses. This file is composed of the following sequences:
Index files are built from the GDC reference genome and are used with the software listed below.
GDC.h38.d1.vd1 BWA Index Files
GDC.h38.d1.vd1 GATK Index Files
GDC.h38.d1.vd1 STAR2 Index Files (v36)
GDC.h38.d1.vd1 STAR2 Index Files (v22)
Annotation files contain information about the position and identity of regions in the reference genome. They allow software to calculate expression values.
GDC.h38 miRNA database files
GDC.h38 GENCODE v36 GTF
GDC.h38 GENCODE v22 GTF
GDC.h38 GENCODE TSV (v22)
Methylation Array Gene Annotation File (v36)
EPIC.hg38.manifest.gencode.v36.tsv.gz
HM27.hg38.manifest.gencode.v36.tsv.gz
HM450.hg38.manifest.gencode.v36.tsv.gz
Antibody Description Files for TCGA RPPA Data (v36)
Antibody Description Files for TCGA RPPA Data (v22)
Genome Annotation Files for Legacy TCGA Data
SNP6 GRCh38 Remapped Probeset File for Copy Number Variation Analysis
If you are using Masked Copy Number Segment for GISTIC analysis, please only keep probesets with freqcnv = FALSE
SNP6 GRCh38 Liftover Probeset File for Copy Number Variation Analysis
GDC VEP Cache File
gatk4_mutect2_4136_pon.vcf.tar
MuTect2.PON.4136.vcf.tar
MuTect2.PON.5210.vcf.tar
NIH… Turning Discovery Into Health ®