Clinical Proteomic Tumor Analysis Consortium (CPTAC)

The National Cancer Institute’s Clinical Proteomic Tumor Analysis Consortium (CPTAC) is a national effort to accelerate the understanding of the molecular basis of cancer through the application of large-scale proteome and genome analysis, or proteogenomics.

Program Description

CPTAC is a comprehensive and coordinated effort to accelerate the understanding of the molecular basis of cancer through the application of robust, quantitative, proteomic technologies and workflows. The overarching goal of CPTAC is to improve our ability to diagnose, treat and prevent cancer. To achieve this goal in a scientifically rigorous manner, the NCI launched CPTAC to systematically identify proteins that derive from alterations in cancer genomes and related biological processes, and provide this data with accompanying assays and protocols to the public.

CPTAC has provided the Genomic Data Commons (GDC) with genomic data from a total of 1100+ cancer patients with diverse disease types including Endometrial, Renal, Lung Adenocarcinoma and Squamous Cell Carcinoma, Breast, Colon, Ovarian, Brain, Head and Neck, and Pancreatic cancers. The GDC harmonized DNA sequences from CPTAC whole genome sequencing (WGS), whole exomes sequencing (WXS), and RNA sequences with the GRCh38 reference genome using GDC DNA-Seq Analysis Pipelines and mRNA Analysis Pipelines, respectively. The CPTAC harmonized genomic data is available in the GDC Data Portal. CPTAC makes proteomic data that are processed through the CPTAC Common Data analysis Pipeline (CDAP) available in the CPTAC Data Portal. CPTAC proteomic data is also available in the Proteomic Data Commons (PDC).

Data Overview

The CPTAC genomic data can be found on the GDC Data Portal. To request access to protected CPTAC data, please apply to dbGaP for access to the CPTAC 3 Study (study accession phs001287 – endometrial, lung, kidney, brain, head and neck, and pancreatic cancers) or the CPTAC 2 Study (study accession phs000892 – ovarian, breast and colon cancers).

Cancer Types

Disease Type Primary Site
Adenomas and Adenocarcinomas Breast, Bronchus and lung, Colon, Kidney, Rectum, Uterus, NOS
Blood Derived Normal Breast
Cystic, Mucinous and Serous Neoplasms Breast, Other and unspecified female genital organs, Ovary, Retroperitoneum and peritoneum
Ductal and Lobular Neoplasms Breast, Pancreas
Gliomas Brain
Solid Tissue Normal Brain, Pancreas, Breast
Squamous Cell Neoplasms Breast, Bronchus and lung, Other and ill-defined sites

Proteomic and Associated Genomic Data

Data Types and Access Levels

Data Type Data Format # of Cases and Files Estimated File Size Data Access Level
Clinical and Biospecimen TSV,
1137 Cases N/A Open
WGS Aligned Reads BAM 779 Cases
1964 Files
199.49 TB Controlled
WXS Aligned Reads BAM 1119 Cases
2793 Files
125.61 TB Controlled
WXS Raw Simple Somatic Mutations VCF 1106 Cases
6348 Files (5 Callers)
5.21 GB Controlled
WXS Annotated Somatic Mutations VCF, MAF 1106 Cases
12696 Files (5 Callers)
21.47 GB Controlled
WXS Aggregated Somatic Mutations MAF 1099 Cases
1262 Files
459.51 MB Controlled
WXS Masked Somatic Mutations MAF 1099 Cases
1262 Files
83.8 MB Controlled
Targeted Sequencing Aligned Reads BAM 69 Cases
69 Files
472.01 GB Controlled
Targeted Sequencing Raw Simple Somatic Mutation VCF 69 Cases
69 Files
88.8 MB Controlled
RNA-Seq Aligned Reads BAM 1128 Cases
4824 Files
26.94 TB Controlled
Gene Expression Quantification TXT 1128 Cases
6432 Files (4 Quantification Types)
2.3 GB Open
Splice Junction Quantification TSV 1128 Cases
1608 Files
4.97 GB Controlled
miRNA-Seq Aligned Reads BAM 1125 Cases
1597 Files
435.64 GB Controlled
miRNA Expression Quantification TSV 1125 Cases
1597 Files
80.69 MB Open
Isoform Expression Quantification TSV 1125 Cases
1597 Files
1.26 GB Open
Single Cell Analysis TSV, HDF5 18 Cases
36 Files
6.56 GB Open