American Association for Cancer Research (AACR) - PROJECT GENIE

The American Association for Cancer Research (AACR) strives to prevent and cure cancer through research, education, communication, collaboration, funding, and advocacy. Through its programs and services, the AACR fosters research in cancer and related biomedical science; accelerates the dissemination of new research findings among scientists and others dedicated to the conquest of cancer; promotes science education and training; and advances the understanding of cancer etiology, prevention, diagnosis, and treatment throughout the world.

Program Description

The AACR Project Genomics Evidence Neoplasia Information Exchange (GENIE) is an international pan-cancer registry of real-world data assembled through data sharing between 19 leading international cancer centers with the goal of improving clinical decision-making. The registry leverages ongoing clinical sequencing efforts (CLIA/ISO-certified) at participating cancer centers by pooling their data to create a novel, open-access registry to serve as an evidence base for the entire cancer community. Genomic and baseline clinical data from more than 40,000 tumors has been made available in the GDC, following the efforts of AACR’s strategic and technical partners, Sage Bionetworks and Memorial Sloan Kettering Cancer Center. The consortium and its activities are driven by openness, transparency, and inclusion to ensure that the project output remains accessible to the global cancer research community and ultimately benefits patients.

Started in 2017, the project was developed to serve as a prototype for aggregating, harmonizing, and sharing clinical-grade, next-generation sequencing data obtained during routine medical practice.

Data Overview

The data can be found on the GDC Data Portal. To request access to protected GENIE data, please apply to dbGaP for access to the GENIE Study (study accession phs001337).

Available are biospecimen, baseline clinical, and genomic data from 44,756 cancer patients corresponding to AACR Project GENIE version 5.0. The data come from eight original GENIE contributing sites:

  • Dana-Farber Cancer Institute (DFCI)
  • Institut Gustave Roussy (GRCC)
  • University of Texas MD Anderson Cancer Center (MDA)
  • Memorial Sloan Kettering Cancer Center (MSK)
  • Johns Hopkins Sidney Kimmel Comprehensive Cancer Center (JHU)
  • Netherlands Cancer Center (NKI), The Netherlands
  • Princess Margaret Cancer Centre, University Health Network (UHN)
  • Vanderbilt-Ingram Cancer Center (VICC)

The genomic data was lifted over from the GENIE hg19 genome reference to the GDC reference genome standard, hg38.

Cancer Types

Multiple (294)

Data Types and Access Levels

Data Type Data Format # of Cases and Files Estimated File Size Data Access Level
Clinical and Biospecimen TSV,
44756 Cases N/A Open
Masked Annotated Somatic Mutations MAF 44755 Cases
44755 Files
154.62 MB Controlled
Gene Level Copy Number Scores TXT 32528 Cases
32528 Files
749.31 MB Controlled
Transcript Fusion TSV 3132 Cases
3132 Files
970.18 KB Controlled