Clinical Proteomic Tumor Analysis Consortium (CPTAC)

Clinical Proteomic Tumor Analysis Consortium (CPTAC)

The National Cancer Institute’s Clinical Proteomic Tumor Analysis Consortium (CPTAC) is a national effort to accelerate the understanding of the molecular basis of cancer through the application of large-scale proteome and genome analysis, or proteogenomics.

Program Description

CPTAC is a comprehensive and coordinated effort to accelerate the understanding of the molecular basis of cancer through the application of robust, quantitative, proteomic technologies and workflows. The overarching goal of CPTAC is to improve our ability to diagnose, treat and prevent cancer. To achieve this goal in a scientifically rigorous manner, the NCI launched CPTAC to systematically identify proteins that derive from alterations in cancer genomes and related biological processes, and provide this data with accompanying assays and protocols to the public.

CPTAC has provided the Genomic Data Commons (GDC) with genomic data from a total of 300+ cancer patients with diverse disease types including Uterine Corpus Endometrial Carcinoma (UCEC), Clear Cell Renal Cell Carcinoma (CCRCC), and Lung Adenocarcinoma (LUAD). The GDC harmonized DNA sequences from CPTAC whole genome sequencing (WGS), whole exomes sequencing (WXS), and RNA sequences with the GRCh38 reference genome using GDC DNA-Seq Analysis Pipelines and mRNA Analysis Pipelines, respectively. The CPTAC harmonized genomic data is available in the GDC Data Portal. CPTAC makes proteomic data that are processed through the CPTAC Common Data analysis Pipeline (CDAP) available in the CPTAC Data Portal.

Data Overview

The CPTAC genomic data can be found on the GDC Data Portal. To request access to protected CPTAC data, please apply to dbGaP for access to the CPTAC 3 Study (study accession phs001287).

Cancer Types

Data Types and Access Levels

Data Type Data Format # of Cases and Files Estimated File Size Data Access Level
Clinical and Biospecimen TSV,
JSON
322 Cases N/A Open
WGS Aligned Reads BAM 322 Cases
839 Files
85.63 TB Controlled
WXS Aligned Reads BAM 322 Cases
837 Files
35.73 TB Controlled
WXS Raw Simple Somatic Mutations VCF 321 Cases
1620 Files (5 Callers)
1.11 GB Controlled
WXS Annotated Somatic Mutations VCF 321 Cases
1620 Files (5 Callers)
3.83 GB Controlled
RNA-Seq Aligned Reads BAM 322 Cases
1551 Files
9.07 TB Controlled
Gene Expression Quantification TXT 322 Cases
517 Files per Quantification Type (4 Quantification Types)
752.45 MB Open
Splice Junction Quantification TXT 322 Cases
517 Files
1.63 GB Controlled