Cancer Cell. Volume 34, p1-14, August 2, 2018 10.1016/j.ccell.2018.07.001
We present a comprehensive analysis of alternative splicing across 32 TCGA cancer types of 8,705 patients. We detect alternative splicing events (ASE) and tumor variants by reanalyzing TCGA RNA and whole exome sequencing data. Many tumors have thousands of ASE not detectable in TCGA normal and GTEx samples. Overall, tumors have ≈30% more ASE than normal samples. Association analysis of somatic variants with ASE confirmed known trans-associations with variants in SF3B1 and U2AF1 and identified three additional trans-acting variants (IDH1, TADA1, PPP2R1A). On average, we identified ≈940 novel exon-exon junctions (“neojunctions”) in tumor samples not typically found in GTEx normal samples. From protein mass spectra available for TCGA breast and ovarian tumor samples, we confirmed on average ≈1.7 neojunction- and ≈0.6 SNV-derived peptides per tumor sample that are also predicted to be MHC-I binders (“putative neoantigens”). By considering neojunction- in addition to SNV-derived peptides, the fraction of samples for which at least one putative neoantigen can be identified increases from 30% to 75%. Tumor-specific splicing presents a large new class of splicing-associated potential neoantigens that may affect the immune response and could be exploited in immunotherapy, e.g., in personalized tumor vaccines.
Data in the GDC
- GDC Manifests
- Open-Access Data - Download Manifest (39 Files)
- Controlled-Access Data - Download Manifest (6 Files)
Supplemental Data
- Associated Data Files
- Coordinates of all confident alternative 3' splice site events in GFF3 format - merge_graphs_alt_3prime_C2.confirmed.gff3
- PSI Values of all confident alternative 3' events in all TCGA samples in TAB-delimited ASCII format - merge_graphs_alt_3prime_C2.confirmed.txt.gz
- Event counts of all alternative 3' splice site events in all TCGA samples in HDF5 format - merge_graphs_alt_3prime_C2.counts.hdf5
- Coordinates of all confident alternative 5' splice site events in GFF3 format - merge_graphs_alt_5prime_C2.confirmed.gff3
- PSI Values of all confident alternative 5' splice site events in all TCGA samples in TAB-delimited ASCII format - merge_graphs_alt_5prime_C2.confirmed.txt.gz
- Event counts of all alternative 5' splice site events in all TCGA samples in HDF5 format - merge_graphs_alt_5prime_C2.counts.hdf5
- Coordinates of all confident exon skipping events in GFF3 format - merge_graphs_exon_skip_C2.confirmed.gff3
- PSI Values of all confident exon skipping events in all TCGA samples in TAB-delimited ASCII format - merge_graphs_exon_skip_C2.confirmed.txt.gz
- Event counts of all exon skipping events in all TCGA samples in HDF5 format - merge_graphs_exon_skip_C2.counts.hdf5
- Coordinates of all confident intron retention events in GFF3 format - merge_graphs_intron_retention_C2.confirmed.gff3
- PSI Values of all confident intron retention events in all TCGA samples in TAB-delimited ASCII format - merge_graphs_intron_retention_C2.confirmed.txt.gz
- Event counts of all intron retention events in all TCGA samples in HDF5 format - merge_graphs_intron_retention_C2.counts.hdf5
- Coordinates of all confident mutually exclusive exon events in GFF3 format - merge_graphs_mutex_exons_C2.confirmed.gff3
- PSI Values of all confident mutually exclusive exon events in all TCGA samples in TAB-delimited ASCII format - merge_graphs_mutex_exons_C2.confirmed.txt.gz
- Event counts of all mutually exclusive exon events in all TCGA samples in HDF5 format - merge_graphs_mutex_exons_C2.counts.hdf5
- PSI Values of all alternative 3' events in all GTEX samples in TAB-delimited ASCII format - gtex_merge_graphs_alt_3prime_C2.confirmed.txt.gz
- Event counts of all alternative 3' splice site events in all GTEX samples in HDF5 format - gtex_merge_graphs_alt_3prime_C2.counts.hdf5
- PSI Values of all alternative 5' splice site events in all GTEX samples in TAB-delimited ASCII format - gtex_merge_graphs_alt_5prime_C2.confirmed.txt.gz
- Event counts of all alternative 5' splice site events in all GTEX samples in HDF5 format - gtex_merge_graphs_alt_5prime_C2.counts.hdf5
- PSI Values of all exon skipping events in GTEX all samples in TAB-delimited ASCII format - gtex_merge_graphs_exon_skip_C2.confirmed.txt.gz
- Event counts of all exon skipping events in GTEX all samples in HDF5 format - gtex_merge_graphs_exon_skip_C2.counts.hdf5
- PSI Values of all intron retention events in GTEX all samples in TAB-delimited ASCII format - gtex_merge_graphs_intron_retention_C2.confirmed.txt.gz
- Event counts of all intron retention events in GTEX all samples in HDF5 format - gtex_merge_graphs_intron_retention_C2.counts.hdf5
- PSI Values of all mutually exclusive exon events in GTEX all samples in TAB-delimited ASCII format - gtex_merge_graphs_mutex_exons_C2.confirmed.txt.gz
- Event counts of all mutually exclusive exon events in GTEX all samples in HDF5 format - gtex_merge_graphs_mutex_exons_C2.counts.hdf5
- List of donor ID and cancer type of samples used for CPTAC-based peptide verification - donor_to_cancertype.tsv
- List of CPTAC dataset used per donor - donor_to_cptac_dataset.tsv
- List of all TCGA sample-IDs within scope of the studies that passed QC procedures - sample_whitelist.txt
- List of all TCGA sample-IDs within scope of the studies that failed QC procedures - sample_excludelist.txt
- Complete list of all FastQC quality criteria used for QC filtering (aggregated values) - fastqc_overview_table.pub.tsv.gz
- Gene expression counts per sample on the whitelisted subset of donors in HDF5 format - expression_counts.whitelisted.hdf5
- Size factors for library size normalization per sample on the whitelisted subset of donors - expression_counts.whitelisted.libsize.tsv
- List of outlier events per sample and cancer type - splice_outlier.tar.gz
- HLA types - Shukla_Wu_Getz_Polysolver_HLA_Types_2015.tsv
- Reference genome-based polypeptides and README file - REFERENCE.polypeptides.tgz
- Sample-specific polypeptide lists and README file - sample_specific_polypeptides.tgz
- Final list of ASNs per sample - asns.tgz
- Final list of SNV-based neoepitope candidates per sample - snvs.tgz
- Full list of neojunctions per sample, including coordinates - tss_complexity_counts.whitelisted.G0.01.globsum20.filtLib.conf3_neojunctions.tsv.gz
- Filtered set of variants across all donors used for association - variantMatrix.dat.gz
- Table of confounders used (mutational load, purity) - MatchedFactors.tsv.gz
- cis-Association results summary statistics - cisassociations.tsv.gz
- trans-Association results summary statistics - transassociations.tsv.gz
- Lists of RNA-Seq confirmed EEJs per sample - rnaseq_confirmed_junctions.tgz
- GTEx background polypeptides and personalized GTEx background polypeptides - gtex_background.tgz
- Raw alignment counts across all splice junctions aggregated in the study - all_junctions_gdc.tar.gz
- Testing results for the differential tests on all alternative events - results_alt_splice_difftest_gdc.tar.gz
- Full list of neojunctions per sample, including coordinates (corrected) - tss_complexity_counts.whitelisted.G0.01.globsum20.filtLib.conf3_neojunctions_corrected.tsv.gz
- Coordinates of all detected alternative 3' splice site events in GFF3 format - merge_graphs_alt_3prime_C2.gff3.gz
- Coordinates of all detected alternative 5' splice site events in GFF3 format - merge_graphs_alt_5prime_C2.gff3.gz
- Coordinates of all detected exon skipping events in GFF3 format - merge_graphs_exon_skip_C2.gff3.gz
- Coordinates of all detected intron retention events in GFF3 format - merge_graphs_intron_retention_C2.gff3.gz
- Coordinates of all detected mutually exclusive exon events in GFF3 format - merge_graphs_mutex_exons_C2.gff3.gz
Instructions for Data Download
Open Access Data
- Download the appropriate manifest file from the publication page
- Use the manifest file to download data using the GDC Data Transfer Tool (DTT) or the GDC API
- GDC DTT ( Download, User's Guide)
- GDC API ( User’s Guide)
Controlled Access Data
- Download the appropriate manifest file from the publication page
- Download a token from the GDC Data Portal
- GDC Data Portal ( Launch, User’s Guide)
- Use the manifest file and token to download data using the GDC DTT or the GDC API
- GDC DTT ( Download, User’s Guide)
- GDC API ( User’s Guide)
For assistance, please contact the GDC Help Desk: support@nci-gdc.datacommons.io.