The National Cancer Institute’s Clinical Proteomic Tumor Analysis Consortium (CPTAC) is a national effort to accelerate the understanding of the molecular basis of cancer through the application of large-scale proteome and genome analysis, or proteogenomics.
CPTAC is a comprehensive and coordinated effort to accelerate the understanding of the molecular basis of cancer through the application of robust, quantitative, proteomic technologies and workflows. The overarching goal of CPTAC is to improve our ability to diagnose, treat and prevent cancer. To achieve this goal in a scientifically rigorous manner, the NCI launched CPTAC to systematically identify proteins that derive from alterations in cancer genomes and related biological processes, and provide this data with accompanying assays and protocols to the public.
CPTAC has provided the Genomic Data Commons (GDC) with genomic data from a total of 1100+ cancer patients with diverse disease types including Endometrial, Renal, Lung Adenocarcinoma and Squamous Cell Carcinoma, Breast, Colon, Ovarian, Brain, Head and Neck, and Pancreatic cancers. The GDC harmonized DNA sequences from CPTAC whole genome sequencing (WGS), whole exomes sequencing (WXS), and RNA sequences with the GRCh38 reference genome using GDC DNA-Seq Analysis Pipelines and mRNA Analysis Pipelines, respectively. The CPTAC harmonized genomic data is available in the GDC Data Portal. CPTAC makes proteomic data that are processed through the CPTAC Common Data analysis Pipeline (CDAP) available in the CPTAC Data Portal. CPTAC proteomic data is also available in the Proteomic Data Commons (PDC).
The CPTAC genomic data can be found on the GDC Data Portal. To request access to protected CPTAC data, please apply to dbGaP for access to the CPTAC 3 Study (study accession phs001287 – endometrial, lung, kidney, brain, head and neck, and pancreatic cancers) or the CPTAC 2 Study (study accession phs000892 – ovarian, breast and colon cancers).
Disease Type | Primary Site |
---|---|
Adenomas and Adenocarcinomas | Breast, Bronchus and lung, Colon, Kidney, Rectum, Uterus, NOS |
Blood Derived Normal | Breast |
Cystic, Mucinous and Serous Neoplasms | Breast, Other and unspecified female genital organs, Ovary, Retroperitoneum and peritoneum |
Ductal and Lobular Neoplasms | Breast, Pancreas |
Gliomas | Brain |
Solid Tissue Normal | Brain, Pancreas, Breast |
Squamous Cell Neoplasms | Breast, Bronchus and lung, Other and ill-defined sites |
Data Type | Data Format | # of Cases and Files | Estimated File Size | Data Access Level |
---|---|---|---|---|
Clinical and Biospecimen | TSV, JSON |
1137 Cases | N/A | Open |
WGS Aligned Reads | BAM | 779 Cases 1964 Files |
199.49 TB | Controlled |
WXS Aligned Reads | BAM | 1119 Cases 2793 Files |
125.61 TB | Controlled |
WXS Raw Simple Somatic Mutations | VCF | 1106 Cases 6348 Files (5 Callers) |
5.21 GB | Controlled |
WXS Annotated Somatic Mutations | VCF, MAF | 1106 Cases 12696 Files (5 Callers) |
21.47 GB | Controlled |
WXS Aggregated Somatic Mutations | MAF | 1099 Cases 1262 Files |
459.51 MB | Controlled |
WXS Masked Somatic Mutations | MAF | 1099 Cases 1262 Files |
83.8 MB | Controlled |
Targeted Sequencing Aligned Reads | BAM | 69 Cases 69 Files |
472.01 GB | Controlled |
Targeted Sequencing Raw Simple Somatic Mutation | VCF | 69 Cases 69 Files |
88.8 MB | Controlled |
RNA-Seq Aligned Reads | BAM | 1128 Cases 4824 Files |
26.94 TB | Controlled |
Gene Expression Quantification | TXT | 1128 Cases 6432 Files (4 Quantification Types) |
2.3 GB | Open |
Splice Junction Quantification | TSV | 1128 Cases 1608 Files |
4.97 GB | Controlled |
miRNA-Seq Aligned Reads | BAM | 1125 Cases 1597 Files |
435.64 GB | Controlled |
miRNA Expression Quantification | TSV | 1125 Cases 1597 Files |
80.69 MB | Open |
Isoform Expression Quantification | TSV | 1125 Cases 1597 Files |
1.26 GB | Open |
Single Cell Analysis | TSV, HDF5 | 18 Cases 36 Files |
6.56 GB | Open |
NIH… Turning Discovery Into Health ®