Cell Systems. Volume 6 Issue 3: p271-281.e7, 28 March 2018 10.1016/j.cels.2018.03.002
Abstract
The Cancer Genome Atlas (TCGA) cancer genomics dataset includes over 10,000 tumor-normal exome pairs across 33 different cancer types, in total >400 TB of raw data files requiring analysis. Here we describe the Multi-Center Mutation Calling in Multiple Cancers project, our effort to generate a comprehensive encyclopedia of somatic mutation calls for the TCGA data to enable robust cross-tumor-type analyses. Our approach accounts for variance and batch effects introduced by the rapid advancement of DNA extraction, hybridization-capture, sequencing, and analysis methods over time. We present best practices for applying an ensemble of seven mutation-calling algorithms with scoring and artifact filtering. The dataset created by this analysis includes 3.5 million somatic variants and forms the basis for PanCan Atlas papers. The results have been made available to the research community along with the methods used to generate them. This project is the result of collaboration from a number of institutes and demonstrates how team science drives extremely large genomics projects.
Data in the GDC
- GDC Manifests
- Open-Access Data - Download Manifest (6 Files)
- Controlled-Access Data - Download Manifest (52 Files)
Supplemental Data
- MAF Files
- MC3 Public MAF - mc3.v0.2.8.PUBLIC.maf.gz
- MC3 Controlled MAF - mc3.v0.2.8.CONTROLLED.maf.gz
- mc3.v0.2.8.BROAD_VALIDATIONv2.maf.gz
- mc3.v0.2.9.CONTROLLED_lt3_b.maf.gz
- TCGA Cohort Variant Calls
- ACC.tar.gz
- BLCA.tar.gz
- BLCA2.tar.gz
- BRCA.tar.gz
- CESC.tar.gz
- CHOL.tar.gz
- COAD.tar.gz
- DLBC.tar.gz
- ESCA.tar.gz
- GBM.tar.gz
- HNSC.tar.gz
- KICH.tar.gz
- KIRC.tar.gz
- KIRP.tar.gz
- LAML.tar.gz
- LGG.tar.gz
- LIHC.tar.gz
- LUAD.tar.gz
- LUSC.tar.gz
- MESO.tar.gz
- OV.tar.gz
- PAAD.tar.gz
- PCPG.tar.gz
- PRAD.tar.gz
- READ.tar.gz
- SARC.tar.gz
- SKCM.tar.gz
- STAD.tar.gz
- TGCT.tar.gz
- THCA.tar.gz
- THYM.tar.gz
- UCEC.tar.gz
- UCS.tar.gz
- UVM.tar.gz
- broad_mc3_vcfs.tar.gz
- Reference Files
- Reference Data - ref_data_for_oxog_all_permissions.tar.gz
- Reference Data - 2 - ref_data_for_oxog.tar
- Target Region BED File - gencode.v19.basic.exome.bed
- gaf_20111020Plusbroad_wex_1.1_hg19.bed - gaf_20111020Plusbroad_wex_1.1_hg19.bed
- Filtered Variants
- ConTest Filtered Variants - contestkeys.txt
- Bitgt Filtered Variants - mark_bitgt.txt
- Non-Exonic Filtered Variants - mark_nonexonic.txt
- NDP Filtered Variants - ndp.mark.txt
- Non-Preferred Pair Filtered Variants - nonpreferredpair_maf_keys.txt
- OxoG Filtered Variants - oxog.annotation
- Strandbias Filtered Variants - strandBias.filter_v2.txt.gz
- WGA Filtered Variants - wga_maf_keys.txt
- pcadontusekeys.txt
- pancan.merged.v0.2.exac_pon_tagged.txt.gz
- pancan.merged.v0.2.pfiltered.broad_pon_tagged_v2.txt.gz
- Miscellaneous Files
- README - readme.txt
- QC Scores - mc3_qc_scores_2016-04-26a.txt
- final_summed_tokens.hist.bin
- oxoG_docker_with_patch_and_minibam.tar - oxoG_docker_with_patch_and_minibam.tar
- vep_supporting_files.tar.gz - vep_supporting_files.tar.gz
Additional Resources
- ISB-CGC PanCancer Atlas BigQuery Tables (link is external) Institute for Systems Biology
- Broad Institute FireCloud (link is external) The Broad Institute
- cBioPortal for Cancer Genomics (link is external) Memorial Sloan-Kettering Cancer Center
- Next-Generation Clustered Heat Maps (link is external) MD Anderson Cancer Center
- PanCanAtlas Additional Files
Instructions for Data Download
Open Access Data
- Download the appropriate manifest file from the publication page
- Use the manifest file to download data using the GDC Data Transfer Tool (DTT) or the GDC API
- GDC DTT ( Download, User's Guide)
- GDC API ( User’s Guide)
Controlled Access Data
- Download the appropriate manifest file from the publication page
- Download a token from the GDC Data Portal
- GDC Data Portal ( Launch, User’s Guide)
- Use the manifest file and token to download data using the GDC DTT or the GDC API
- GDC DTT ( Download, User’s Guide)
- GDC API ( User’s Guide)
For assistance, please contact the GDC Help Desk: support@nci-gdc.datacommons.io.