Citation TBD
Development of new therapeutics and validation of pathogenetic mechanisms in cancer require representative laboratory models. However, existing model collections do not yet represent the full spectrum of diversity observed in human cancer. Recent technologies enable more efficient model derivation (e.g., tumor organoids), yet whether such models maintain the molecular states of the originating tumor during long- term ex vivo expansion has not been systematically investigated. Here, we present the results of a large-scale international program—the Human Cancer Models Initiative (HCMI)—generating a resource of 665 patient-derived models from 2,780 donors with 25 cancer types, with integrated tumor/model whole genome, exome, methylome and transcriptome analyses. The resource encompasses 522 models with comprehensive clinical data, 153 models of rare cancers, and 71 models from participants with non-European ancestry. Analysis of 421 matched tumor/model pairs revealed a high degree of genetic (97.8%) and epigenetic (95%) concordance and defines specific correlates of discordance and model fidelity. However, single-nucleus RNA sequencing shows that, for some models, detectable cell states are influenced by culture conditions. We identify models with extrachromosomal DNA-based gene amplification and post-treatment mutational signatures, providing opportunities to study mechanisms of therapeutic resistance. This model repository is being made available to the cancer research community–including multimodal molecular profiling, clinical attributes, such as response to treatments before and after model generation, and integrative software tools–thus providing a valuable resource for preclinical investigation of cancer pathogenesis and treatment response.
Supplemental Data
- GDC Manifests
- HCMI-CMDC Controlled-Access Data - Download Manifest (38 Files)
- HCMI-CMDC Open-Access Data - Download Manifest (70 Files)
- DNA Whole Genome and Whole Exome Data
- nygc_snv_calls.tar.gz - NYGC snv/indel calls (Controlled)
- broad_snv_calls.tar.gz - Broad snv/indel calls (Controlled)
- washu_snv_calls.tar.gz - WashU snv/indel calls (Controlled)
- consensus_snv_calls.tar.gz - Consensus snv/indel calls (Controlled)
- ascat_cnv_calls.txt - Segmented copy number from Ascat (Controlled)
- ReMixT_copy_number.tar.gz - ReMixT copy number data compressed tarball (Controlled)
- cn_obs.bin50k.HCMI-remixt.h5ad.gz - ReMixT copy number anndata table (intermediate file) (Controlled)
- cn_obs.bin50k.TCGA.h5ad.gz - TCGA copy number anndata table (intermediate file) (Controlled)
- purple_cnv_calls.txt - Segmented copy number from Purple (Controlled)
- cn_obs.bin50k.HCMI-consensus.h5ad.gz - Consensus copy number anndata table (intermediate file) (Controlled)
- nygc_sv.bedpe - NYGC structural variant calls (Controlled)
- broad_sv.bedpe - Broad Institute structural variant calls (Controlled)
- washu_sv.bedpe - WashU structural variant calls (Controlled)
- mskcc_sv.bedpe - MSKCC structural variant calls (Controlled)
- embl_sv.bedpe - EMBL structural variant calls (Controlled)
- consensus_sv.bedpe - Consensus structural variants (Controlled)
- consensus_purity_ploidy.txt - Consensus purity/ploidy Values (Controlled)
- loh_shared.csv - Shared LOH fraction between tumors and their paired models (Controlled)
- wgd_status.csv - Whole Genome Doubling status for tumors and models (Controlled)
- Mutational Signatures.xlsx - Mutational signature (Supplementary Table 2)
- cnv_drivers.yaml - Curated CNV driver oncogenes (Controlled)
- snvindel_drivers.tsv - SNV/indel driver genes from TCGA (https://pmc.ncbi.nlm.nih.gov/articles/PMC6029450) (Controlled)
- CNV Drivers.xlsx - CNV drivers (Supplementary Table 10)
- aa-amplicon-summary-merged.rescued_ecdna.txt - ecDNA calls for both tumors and models (Controlled)
- hcmi-merged-feature-comparisons.concordant_ecdna.txt - ecDNA calls found to be concordant between tumors and their paired models (Controlled)
- DNA Methylation Data
- template_latent_reporting.xlsx - Latent transcription factor distance template (Controlled)
- processed.tar.gz - Pre-processed DNA methylation files for TMP subtypes classification (Controlled)
- HCMI_DNA_methylation_beta_value_matrix.tar.gz - Processed DNA methylation beta value matrix (Controlled)
- HCMI_TCGA_TARGET_DNA_methylation_beta_value_matrix.qs - HCMI TCGA TARGET merged processed DNA methylation beta value matrix (Controlled)
- RNA-Seq Data
- Transcriptional and epigenetic fidelity.xlsx - Transcriptional and epigenetic model fidelity metrics, TMP and MoMA assignments, GBM gene expression, and GSEA results (Supplementary Table 3)
- TCGA-MOMA subtypes.xlsx - TCGA MOMA subtype (Supplementary Table 5)
- HCMI-MOMA subtypes.xlsx - HCMI MOMA subtypes (Supplementary Table 6)
- Top50MR for MOMA subtypes - Top50MR for MOMA subtypes (Supplementary Table 7)
- gdc_download_ref.tar.gz - Euclidean and Latent Transcription Factor Distance methods GDC manifests and sample sheets (Controlled)
- key_samples.csv - Key sample pairs for distance and subtype analysis used for Euclidean and Latent Transcription Factor Distance methods (Controlled)
- Celligner_aligned_data.gz - Celligner aligned expression data (Controlled)
- Celligner_cPC_loadings.xlsx - Celligner cPC loadings (Controlled)
- Celligner_pairwise_euclidean_distances_in_70PC.csv - RNA-seq Celligner Distances (Controlled)
- Celligner Model Tumor All-to-All Distances and Lineage Assignment.xlsx - Celligner Model Tumor distances and lineage assignment (Supplementary Table 12)
- snRNA-Seq Data
- snRNAseq_samples.xlsx - snRNAseq samples (Supplementary Table 8)
- GBM-malignant-reference-signature.tsv - Gene statistics computed from a reference GBM cellular population used for data centering and scaling of GBM snRNA-seq lognormalized data
- GBM networks - Gene regulatory network used for protein activity-based PCA projection of GBM samples in Fig. 5
- PAAD networks - Gene regulatory network used for protein activity-based analysis of PAAD samples (Fig. 5, Extended Data Fig. 7)
- PDAC-all-samples-protein-activity-ref-cells_from_model-NES.tsv - VIPER-inferred protein activity matrix for malignant cells across all PAAD snRNA-seq samples. (Note: the suffix indicates that the protein activity matrix was generated using a reference set of cells from models; however, the matrix includes samples from both tumors and models.)
- PDAC-all-samples-ref-cells_from_model-PCA.csv - PCA coordinates for protein activity visualization of PAAD samples (Fig. 5; Extended Data Fig. 7)
- demultiplexed_samples_best_gt_thresh-filtered-01 - Directory containing .h5ad files for HCMI snRNAseq demultiplexed samples.
- AG001_HCM-CSHL-0143-C20_gt.h5ad
- AG001_HCM-CSHL-0322-C20_gt.h5ad
- AG001_HCM-CSHL-0247-C18_gt.h5ad
- AG007_HCM-BROD-0416-C71_gt.h5ad
- AG006_HCM-BROD-0213-C71_gt.h5ad
- AA015_HCM-BROD-0012-C71_gt.h5ad
- AG011_HCM-CSHL-0089-C25_gt.h5ad
- AA017_HCM-BROD-0415-C71_gt.h5ad
- AG007_HCM-BROD-0213-C71_gt.h5ad
- AG004_HCM-BROD-0001-C18_gt.h5ad
- AG006_HCM-BROD-0416-C71_gt.h5ad
- AG005_HCM-BROD-0028-C71_gt.h5ad
- AG012_HCM-CSHL-0078-C25_gt.h5ad
- AG002_HCM-CSHL-0247-C18_gt.h5ad
- AG008_HCM-BROD-0110-C25_gt.h5ad
- AG010_HCM-CSHL-0089-C25_gt.h5ad
- AG013_HCM-CSHL-0078-C25_gt.h5ad
- AA015_HCM-BROD-0199-C71_gt.h5ad
- AG012_HCM-CSHL-0089-C25_gt.h5ad
- AG002_HCM-CSHL-0322-C20_gt.h5ad
- AG009_HCM-CSHL-0073-C25_gt.h5ad
- AG010_HCM-CSHL-0078-C25_gt.h5ad
- AA014_HCM-BROD-0012-C71_gt.h5ad
- AG005_HCM-BROD-0001-C18_gt.h5ad
- AA016_HCM-BROD-0415-C71_gt.h5ad
- AA016_HCM-BROD-0002-C71_gt.h5ad
- AG008_HCM-CSHL-0073-C25_gt.h5ad
- AA014_HCM-BROD-0199-C71_gt.h5ad
- AG013_HCM-CSHL-0089-C25_gt.h5ad
- AG009_HCM-BROD-0110-C25_gt.h5ad
- AG011_HCM-CSHL-0078-C25_gt.h5ad
- AG002_HCM-CSHL-0143-C20_gt.h5ad
- AG004_HCM-BROD-0028-C71_gt.h5ad
- AA017_HCM-BROD-0002-C71_gt.h5ad
- filtered_barcodes - Directory containing .tsv files with CellRanger detected barcodes from each multiplexed sequencing run. They can be used as an input to demuxlet for single-sample demultiplexing
- AG013_barcodes.tsv
- AG010_barcodes.tsv
- AG002_barcodes.tsv
- AG008_barcodes.tsv
- AA016_barcodes.tsv
- AG012_barcodes.tsv
- AG006_barcodes.tsv
- AG009_barcodes.tsv
- AG004_barcodes.tsv
- AG011_barcodes.tsv
- AG005_barcodes.tsv
- AA014_barcodes.tsv
- AG001_barcodes.tsv
- AA017_barcodes.tsv
- AG007_barcodes.tsv
- AA015_barcodes.tsv
- merged-filtered-01-vcf-trimmed-BAM-sorted - Directory containing multi-sample VCF files to be used as input to Demuxlet (Controlled)
- AA014_AA015.vcf.gz (Controlled)
- AG004_AG005.vcf.gz (Controlled)
- AA016_AA017.vcf.gz (Controlled)
- AG001_AG002.vcf.gz (Controlled)
- AG008_AG009.vcf.gz (Controlled)
- AG010_AG011_AG012_AG013.vcf.gz (Controlled)
- AG006_AG007.vcf.gz (Controlled)
- DE GBM snRNAseq.xlsx - GBM differential expression snRNAseq (Supplementary Table 9)
- Metadata Files
- Comprehensive model metadata and patient characteristics.xlsx - Model and patient demographics metadata (Supplementary Table 1)
- hcmi_cases_days_to_all_events_tx_naive_2024-03-19.xlsx - HCMI days to all events clinical metadata (Controlled)
- HCMI_AWG_Model-Tumor-Normal_Linkage_v2.0_2.20.2024.txt - Sample pair info that was extracted and subset from HCMI_AWG_Model-Tumor-Normal_Linkage_v2.0_2.20.2024.xlsx used for Euclidean and Latent Transcription Factor Distance methods (Controlled)
Additional Resources
Instructions for Data Download
Open Access Data
- Download the appropriate manifest file from the publication page
- Use the manifest file to download data using the GDC Data Transfer Tool (DTT) or the GDC API
- GDC DTT (Download, User's Guide)
- GDC API (User’s Guide)
Controlled Access Data
- Download the appropriate manifest file from the publication page
- Download a token from the GDC Data Portal
- GDC Data Portal (Launch, User’s Guide)
- Use the manifest file and token to download data using the GDC DTT or the GDC API
- GDC DTT (Download, User’s Guide)
- GDC API (User’s Guide)
For assistance, please contact the GDC Help Desk: support@nci-gdc.datacommons.io.