Science. Volume 362, Issue 6413 p.eaav1898, 25 October 2018 10.1126/science.aav1898
We present the genome-wide chromatin accessibility profiles of 410 tumor samples spanning 23 cancer types from The Cancer Genome Atlas. We identify 562,709 transposase-accessible DNA elements that substantially extend the compendium of known cis-regulatory elements. Integration of ATAC-seq with TCGA multi-omic data identifies a large number of putative distal enhancers that distinguish molecular subtypes of cancers, uncovers specific driving transcription factors via protein-DNA footprints, and nominates long-range gene-regulatory interactions in cancer. These data reveal genetic risk loci of cancer predisposition as active DNA regulatory elements in cancer, identify gene-regulatory interactions underlying cancer immune evasion, and pinpoint noncoding mutations that drive enhancer activation and may impact patient survival. These results suggest a systematic approach to understand the noncoding genome in cancer to advance diagnosis and therapy.
Data in the GDC
- GDC Manifests
- Open Access Data - Download Manifest (47 Files)
- Controlled Access Data - Download Manifest (1 File)
Views of the Data
The ATAC-seq peak accessibility and computed peak-to-gene linkage predictions are publicly available for interactive visualization and exploration at the UCSC Xena Browser (https://atacseq.xenahubs.net).
Supplemental Data Files
-
- Data S1. Cancer types studied, donor characteristics, and sequencing statistics. [xlsx]
- Data S2. Pan-cancer and breast cancer peak calls. [xlsx]
- Data S3. Overlap of peaks with Roadmap DNase-seq, peak saturation analysis, and t-SNE positions of all samples. [xlsx]
- Data S4. Distal binarization analysis and enrichment of motifs in cluster-specific peak sets. [xlsx]
- Data S5. GWAS and eQTL analyses and overlap with peak-to-gene links. [xlsx]
- Data S6. TF footprinting analyses and correlation to gene expression. [xlsx]
- Data S7. Pan-cancer and breast cancer-specific peak-to-gene links and enhancer-to-gene links. [xlsx]
- Data S8. ELMER and Regulon analyses. [xlsx]
- Data S9. Peak-to-gene links related to immune response in cancer. [xlsx]
- Data S10. Open Access. Integration of ATAC-seq and WGS to identify noncoding mutations. Only contains mutation positions, no base changes provided. [xlsx]
- Data S10. Controlled Access. Integration of ATAC-seq and WGS to identify noncoding mutations. Contains mutation positions and base changes.[xlsx]
- Protocol S1. Processing of frozen tissue fragments for ATAC-seq. [pdf]
Other Supplemental Data Files
- ATAC-seq Counts Matrices
- README to facilitate usage of count matrices. [TXT]
- Normalized ATAC-seq insertion counts within the pan-cancer peak set. Recommended format. [RDS]
- Normalized ATAC-seq insertion counts within the pan-cancer peak set. Not recommended due to size. [TXT]
- Raw ATAC-seq insertion counts within the pan-cancer peak set. [RDS]
- Raw ATAC-seq insertion counts within the pan-cancer peak set. [TXT]
- All cancer type-specific count matrices in normalized counts. [ZIP]
- All cancer type-specific count matrices in raw counts. [ZIP]
- ATAC-seq Peak Calls
- BigWig Files for All Samples
- README to facilitate usage of bigWig files [TXT]
- Normalized bigWig files for the ACC cohort. [TAR-GZ]
- Normalized bigWig files for the BLCA cohort. [TAR-GZ]
- Normalized bigWig files for the BRCA cohort. [TAR-GZ]
- Normalized bigWig files for the CESC cohort. [TAR-GZ]
- Normalized bigWig files for the CHOL cohort. [TAR-GZ]
- Normalized bigWig files for the COAD cohort. [TAR-GZ]
- Normalized bigWig files for the ESCA cohort. [TAR-GZ]
- Normalized bigWig files for the GBM cohort. [TAR-GZ]
- Normalized bigWig files for the HNSC cohort. [TAR-GZ]
- Normalized bigWig files for the KIRC cohort. [TAR-GZ]
- Normalized bigWig files for the KIRP cohort. [TAR-GZ]
- Normalized bigWig files for the LGG cohort. [TAR-GZ]
- Normalized bigWig files for the LIHC cohort. [TAR-GZ]
- Normalized bigWig files for the LUAD cohort. [TAR-GZ]
- Normalized bigWig files for the LUSC cohort. [TAR-GZ]
- Normalized bigWig files for the MESO cohort. [TAR-GZ]
- Normalized bigWig files for the PCPG cohort. [TAR-GZ]
- Normalized bigWig files for the PRAD cohort. [TAR-GZ]
- Normalized bigWig files for the SKCM cohort. [TAR-GZ]
- Normalized bigWig files for the STAD cohort. [TAR-GZ]
- Normalized bigWig files for the TGCT cohort. [TAR-GZ]
- Normalized bigWig files for the THCA cohort. [TAR-GZ]
- Normalized bigWig files for the UCEC cohort. [TAR-GZ]
Analysis Code
-
- Code for ELMER probe-to-gene analysis. [HTML]
Additional Resources
- PEP-format metadata for bulk ATAC-seq alignment
- GDC Encyclopedia
- Descriptions of TCGA data are provided in the TCGA Barcode Encyclopedia Page
- Genomic Data Commons Portal
Instructions for Data Download
Open Access Data
- Download the appropriate manifest file from the publication page
- Use the manifest file to download data using the GDC Data Transfer Tool (DTT) or the GDC API
- GDC DTT ( Download, User's Guide)
- GDC API ( User’s Guide)
Controlled Access Data
- Download the appropriate manifest file from the publication page
- Download a token from the GDC Data Portal
- GDC Data Portal ( Launch, User’s Guide)
- Use the manifest file and token to download data using the GDC DTT or the GDC API
- GDC DTT ( Download, User’s Guide)
- GDC API ( User’s Guide)
For assistance, please contact the GDC Help Desk: support@nci-gdc.datacommons.io.