The National Cancer Institute’s Clinical Proteomic Tumor Analysis Consortium (CPTAC) is a national effort to accelerate the understanding of the molecular basis of cancer through the application of large-scale proteome and genome analysis, or proteogenomics.
Program Description
CPTAC is a comprehensive and coordinated effort to accelerate the understanding of the molecular basis of cancer through the application of robust, quantitative, proteomic technologies and workflows. The overarching goal of CPTAC is to improve cancer diagnosis, treatment and prevention. To achieve this goal in a scientifically rigorous manner, the NCI launched CPTAC to systematically identify proteins that derive from alterations in cancer genomes and related biological processes and provide this data with accompanying assays and protocols to the public.
CPTAC has provided the Genomic Data Commons (GDC) with genomic data from cancer patients with diverse disease types. The GDC harmonized DNA sequences from CPTAC whole genome sequencing (WGS), whole exomes sequencing (WXS), and RNA sequences with the GRCh38 reference genome using GDC DNA-Seq Analysis Pipelines and mRNA Analysis Pipelines, respectively. The CPTAC harmonized genomic data is available in the GDC Data Portal. CPTAC proteomic data is available in the Proteomic Data Commons (PDC), where all datasets are harmonized and uniformly processed using the CPTAC Common Data Analysis Pipeline (CDAP), with outputs provided as Protein Assembly Reports on the portal.
Data Overview
Data Access
The CPTAC genomic data can be found on the GDC Data Portal. To request access to protected CPTAC data, please apply to dbGaP for access to the CPTAC 3 Study (study accession phs001287 or the CPTAC 2 Study (study accession phs000892).
Associated Data
- Acute Myeloid Leukemia (AML) (Genomic Data, Proteomic Data)
- Breast Cancer (Genomic Data, Proteomic Data)
- Colon Cancer (Genomic Data, Proteomic Data)
- Endometrial Cancer (Discovery) (Genomic Data, Proteomic Data)
- Endometrial Cancer (Confirmatory) (Genomic Data, Proteomic Data)
- Glioblastoma (Adult, Discovery) (Genomic Data, Proteomic Data)
- Glioblastoma (Adult, Confirmatory) (Genomic Data, Proteomic Data)
- Gastric Cancer (Genomic Data, Proteomic Data)
- Head and Neck Cancer (Genomic Data, Proteomic Data)
- Kidney Cancer (Discovery ccRCC) (Genomic Data, Proteomic Data)
- Kidney Cancer (Confirmatory ccRCC) (Genomic Data, Proteomic Data)
- Kidney Cancer (non-ccRCC) (Genomic Data, Proteomic Data)
- Lung Adenocarcinoma (Discovery) (Genomic Data, Proteomic Data)
- Lung Adenocarcinoma (Confirmatory) (Genomic Data, Proteomic Data)
- Lung Squamous Cell Carcinoma (Genomic Data, Proteomic Data)
- Ovarian Cancer (Genomic Data, Proteomic Data)
- Pancreatic Cancer (Genomic Data, Proteomic Data)
Data Types
| Data Type | Data Format | Data Access Level |
|---|---|---|
| Clinical and Biospecimen | TSV, JSON | Open |
| WGS Aligned Reads | BAM | Controlled |
| WXS Aligned Reads | BAM | Controlled |
| WXS Raw Simple Somatic Mutations | VCF | Controlled |
| WXS Annotated Somatic Mutations | VCF, MAF | Controlled |
| WXS Aggregated Somatic Mutations | MAF | Controlled |
| WXS Masked Somatic Mutations | MAF | Open |
| Targeted Sequencing Aligned Reads | BAM | Controlled |
| Targeted Sequencing Raw Simple Somatic Mutation | VCF | Controlled |
| RNA-Seq Aligned Reads | BAM | Controlled |
| Gene Expression Quantification | TXT | Open |
| Splice Junction Quantification | TSV | Controlled |
| Transcript Fusion | TSV, BEDPE | Controlled |
| miRNA-Seq Aligned Reads | BAM | Controlled |
| miRNA Expression Quantification | TSV | Open |
| Isoform Expression Quantification | TSV | Open |
| Single Cell Analysis | TSV, HDF5 | Open |
| Methylation Arrays | IDAT, TXT | Open |