Cell Systems. Volume 9, Issue 1 p.24-34., 24 July 2019 10.1016/j.cels.2019.06.006
We present the first systematic analysis of the effects of synchronizing a large-scale, deeply characterized, multi-omic dataset to the current human reference genome, using updated software, pipelines, and annotations. For each of 5 molecular data platforms in The Cancer Genome Atlas (TCGA)—mRNA and miRNA expression, single nucleotide variants, DNA methylation and copy number alterations—comprehensive sample, gene and probe-level studies were performed, towards quantifying the degree of similarity between the ‘legacy’ GRCh37 (hg19) TCGA data and its GRCh38 (hg38) version as ‘harmonized’ by the Genomic Data Commons. We offer gene lists to elucidate differences that remained after controlling for confounders, and strategies to mitigate their impact on biological interpretation. Our results demonstrate that the hg19 and hg38 TCGA datasets are very highly concordant, promote informed use of either legacy or harmonized omics data, and provide a rubric that encourages similar comparisons as new data emerge and reference data evolve.
Data in the GDC
- GDC Manifests
- Open Access Data - Download Manifest (18 Files)
Supplemental Data Files
-
- SuppData-1.1 miRNA-hg19-file-manifest.tsv [tsv]
- SuppData-1.2 miRNA-hg38-file-manifest.tsv [tsv]
- SuppData-2.1 CN-hg19-file-manifest.txt [txt]
- SuppData-2.2 CN-hg38-file-manifest.txt [txt]
- SuppData-2.3 CN-TotalAverageRawGeneValues.tsv [tsv]
- SuppData-2.4 CN-DeviantGenes_ByCount.tsv [tsv]
- SuppData-2.5 CN-VennStats_focalpeaks.tsv [tsv]
- SuppData-2.6 CN-DocumentedDriverDifferences.xlsx [xlsx]
- SuppData-3.1 dnaMethylation-HM27-hg19-file-manifest-20180410.tsv [tsv]
- SuppData-3.2 dnaMethylation-HM450-hg19-file-manifest-20180410.tsv [tsv]
- SuppData-3.3 dnaMethylation-HM27-hg38-file-manifest-20180410.tsv [tsv]
- SuppData-3.4 dnaMethylation-HM450-hg38-file-manifest-20180410.tsv [tsv]
- SuppData-3.5 dnaMethylation-WGBS-hg19-file-manifest-47_TCGA-20181217.tsv [tsv]
- SuppData-4.1 RNAseq-file-manifest.xlsx [xlsx]
- SuppData-4.2 RNAseq-absDiffBySubtype.xlsx [xlsx]
- SuppData-5.1 somaticMutation-hg19-file-manifest.txt [txt]
- SuppData-5.2 somaticMutation-hg38-file-manifest.txt [txt]
- SuppData-5.3 somaticMutation-discordant-public-mutations.xlsx [xlsx]
Additional Resources
- GDC Encyclopedia
- Descriptions of TCGA data are provided in the TCGA Barcode Encyclopedia Page
- Genomic Data Commons Portal
Instructions for Data Download
Open Access Data
- Download the appropriate manifest file from the publication page
- Use the manifest file to download data using the GDC Data Transfer Tool (DTT) or the GDC API
- GDC DTT ( Download, User's Guide)
- GDC API ( User’s Guide)
Controlled Access Data
- Download the appropriate manifest file from the publication page
- Download a token from the GDC Data Portal
- GDC Data Portal ( Launch, User’s Guide)
- Use the manifest file and token to download data using the GDC DTT or the GDC API
- GDC DTT ( Download, User’s Guide)
- GDC API ( User’s Guide)
For assistance, please contact the GDC Help Desk: support@nci-gdc.datacommons.io.