Nature. 490: p61-70. 23 September 2012 10.1038/nature11412
We analyzed primary breast cancers by genomic DNA copy number arrays, DNA methylation, exome sequencing, mRNA arrays, microRNA sequencing and reverse phase protein arrays. Our ability to integrate information across platforms provided key insights into previously-defined gene expression subtypes and demonstrated the existence of four main breast cancer classes when combining data from five platforms, each of which shows significant molecular heterogeneity. Somatic mutations in only three genes (TP53, PIK3CA and GATA3) occurred at > 10% incidence across all breast cancers; however, there were numerous subtype-associated and novel gene mutations including the enrichment of specific mutations in GATA3, PIK3CA and MAP3K1 with the Luminal A subtype. We identified two novel protein expression-defined subgroups, possibly contributed by stromal/microenvironmental elements, and integrated analyses identified specific signaling pathways dominant in each molecular subtype including a HER2/p-HER2/HER1/p-HER1 signature within the HER2-Enriched expression subtype. Comparison of Basal-like breast tumors with high-grade Serous Ovarian tumors showed many molecular commonalities, suggesting a related etiology and similar therapeutic opportunities. The biologic finding of the four main breast cancer subtypes caused by different subsets of genetic and epigenetic abnormalities raises the hypothesis that much of the clinically observable plasticity and heterogeneity occurs within, and not across, these major biologic subtypes of breast cancer.
Data in the GDC
- GDC Manifests
- Open-Access Data - Download Manifest (71 Files)
- Controlled-Access Data - Download Manifest (11 Files)
Supplemental Data
These data represent a data freeze from November 11, 2011.
- Sample Lists - below are links to cumulative list of samples for the publication
- Final Full BRCA Sample Summary
- Individual Analysis Sample Summaries
- Mutations - Level 2 MAF archives containing exome-based somatic mutations
- File used in manuscript
- Somatic MAF archive [tar.gz]
- Somatic MAF archive md5
- File used in manuscript
- Mutations - Publicly accessible MAF archives
The mutations in this MAF archive are a subset of those in the controlled access archive. The mutations include only those that have been verified as somatic. Mutations that could not be verified as somatic have been removed. - Mutations - Germline MAF File (controlled access)
This file contain germline mutations and is access controlled.- TCGA_germline-variants_analysis-12-14-11-BERG.Couch.KH.xlsx
- RNA Expression
- Data Matrix Files
- BRCA.exp.547.med.txt - Full Expression Data set consisting of 522 primary tumors, 3 metastatic tumors, and 22 tumor-adjacent normal samples. Data was median centered by genes.
- BRCA.exp.466.med.txt - Data freeze with data on 5/6 platforms (mRNA, miRNA, methylation, copy number, and whole exome sequencing). 463 primary tumors and 3 metastatic tumors. Data was median centered by genes.
- BRCA.exp.348.med.txt - Data freeze with data on 6 platforms (mRNA, miRNA, methylation, copy number, protein, and whole exome sequencing). 348 primary tumors. Data was median centered by genes.
- BRCA.547.PAM50.SigClust.Subtypes.txt -PAM50 and SigClust Subtype Assignments.
- Level 3 Data Archives
- Level 2 Data Archives
- Level 1 Data Archives
- Data Matrix Files
- SNP and Copy Number
- Additional Files
- Level 3 Data Archives
- Level 2 Data Archives
- BRCA.DNASeq.Level_2.tar
- BRCA.DNASeq.Level_2.tar.md5
- miRNA Expression
- Data Matrix Files
- miRNA expression, precursor miRNA genes, normalized to reads per million mapped miRNAs
- BRCA.780.precursor.txt - all samples
- BRCA.466.precursor.txt - data freeze 5 platforms
- BRCA.348.precursor.txt - data freeze 6 platforms
- miRNA expression, mature/star miRNA strands, normalized to reads per million mapped miRNAs
- BRCA.780.mimat.txt - all samples
- BRCA.466.mimat.txt - data freeze 5 platforms
- BRCA.348.mimat.txt - data freeze 6 platforms
- Filtered Data set and Results for NMF Clustering
- Read me file on miRNAseq Clustering
- BRCA.miRNAseq.expn_matrix_mimat_norm_passed_TCGA.697-tumor-samples.20111218.txt - most variant 25% records were input to NMF clustering
- Subtype calls for 5, 7, and 9 classes
- miRNA expression, precursor miRNA genes, normalized to reads per million mapped miRNAs
- Level 3 Data Archives
- Data Matrix Files
- Methylation
- Data Sets
- BRCA.methylation.27k.450k.zip - Full Methylation Data Set (139M)
- BRCA.methylation.27k.450k.466.zip - Data freeze 5 platforms (62.5M)
- Filtered Data Set - BRCA.Methylation.574probes.802.txt
- Methylation Subtype Calls - BRCA.Methylation.clu.memb.802.txt
- Level 3 Data Archives
- Level 2 Data Archives
- Level 1 Data Archives
- Data Sets
- Reverse Phase Protein Array (RPPA)
- Data Sets
- rppa-171Ab-403samp.gct
- rppaData-403Samp-171Ab-notTrimmed.txt
- rppaData-403Samp-171Ab-Trimmed.txt - Data scaled by antibody and extremes values were trimmed (1.5% lowest and 1.5% highest) to increase the contrast of the expression.
- RPPA Subtype Calls
- Level 3 Data Archives
- Level 2 Data Archives
- Level 1 Data Archives
- Data Sets
- Exome Sequence BAM File References
- BRCA.CGhub.bam.files.txt - Bam files at CGHub
- BRCA.Barcodes.Not.at.CGHub.txt - Aliquot barcodes for files not yet at CGHub
- Clinical - below are links to clinical data for the publication.
- Firehose Pipeline Activities
- 2012012400.tar (6.5G)
- 2012012400.tar.md5
- 2012011000.tar (6.5G)
- 2012011000.tar.md5
- 2011102600.tar (3.13G)
- 2011102600.tar.md5
Views of the Data
- Institute for Systems Biology
- Memorial Sloan Kettering Cancer Center
- UCSC
- MD Anderson
Additional Resources
- GDC Encyclopedia
- Descriptions of TCGA data are provided in the TCGA Barcode Encyclopedia Page
- Genomic Data Commons Portal
Instructions for Data Download
Open Access Data
- Download the appropriate manifest file from the publication page
- Use the manifest file to download data using the GDC Data Transfer Tool (DTT) or the GDC API
- GDC DTT (Download, User's Guide)
- GDC API (User’s Guide)
Controlled Access Data
- Download the appropriate manifest file from the publication page
- Download a token from the GDC Data Portal
- GDC Data Portal (Launch, User’s Guide)
- Use the manifest file and token to download data using the GDC DTT or the GDC API
- GDC DTT (Download, User’s Guide)
- GDC API (User’s Guide)
For assistance, please contact the GDC Help Desk: support@nci-gdc. datacommons.io.