News and Announcements
The latest GDC 2.3 Pauling Release introduces several new features designed to enhance user experience and improve data analysis.
GDC Data Dictionary
New GDC Data Dictionary properties were added in the Okazaki release. Key highlights include:
The NCI GDC Analysis Tool Challenge is a collaborative competition aimed at enhancing cancer research by integrating innovative analysis tools with the GDC. The primary objectives of the challenge are to:
GDC Data Release 41 is now accessible on the GDC Data Portal. This release introduces new data sets featuring data for new NCI-MATCH Trial arms, whole slide images, and more. Below is a summary of the key highlights.
The GDC Nightingale release brings an upgrade to the GDC Data Transfer Tool (DTT) client and an Application Programming Interface (API) update to add an endpoint for Other Clinical Data Elements with data soon to follow.
In the GDC 2.2 McClintock Release, the following features have been released:
GDC Data Dictionary 3.1 expands diagnosis information, TCGA properties, and bioinformatics workflows:
Introducing the latest advancements in GDC 2.0 with Release 2.1! Here's what's new:
GDC Data Release 40 is now accessible on the GDC Data Portal. This release introduces new data sets featuring additional TCGA WGS alignments and variant calls, WXS and RNA-Seq data for new NCI-MATCH Trial arms, and more. Below is a summary of the key highlights.
The GDC has released GDC 2.0 which expands on the GDC Data Portal initially launched in 2016 by providing a cohort-centric design and new analysis tools. GDC 2.0 features include:
GDC Data Release 39 is now available on the GDC Data Portal. Several main features for this release include new TCGA WGS variants and additional higher coverage alignments, and five new projects from NCI’s MATCH program:
Introducing a new node for incorporating additional clinical attributes, plus new properties and enumerated values to support The Cancer Genome Atlas (TCGA) and other NCI programs in GDC Data Dictionary Release 3.0.0:
The GDC’s Data Release 38 includes 9000+ high coverage TCGA WGS alignments, data for new CGCI, MATCH, and MP2PRT projects, and a variety of data from other NCI programs:
In GDC Data Dictionary Release 2.6.6, the GDC added support for:
The Genomic Data Commons strives to provide standardized and relevant genomic data to the cancer research community. While data processed according to the most up-to-date pipelines is provided to users through the GDC Data Portal, a number of files processed via older or external pipelines had been kept available for users through the GDC Legacy Archive.
The GDC’s Data Release 37 includes four new projects: APOLLO-LUAD - Proteogenomic Characterization of Lung Adenocarcinoma, CGCI-HTMCP-LC - HIV+ Tumor Molecular Characterization of Lung Cancer, and two treatment arms from NCI's MATCH Clinical Trial: MATCH-Q - Treatment Arm Q (tumors with AKT mutations) and MATCH-Y - Treatment Arm Y (tumors with HER2 amplification).
In GDC Data Dictionary Release 2.6.0, the GDC added support for:
GDC Data Release 36 includes WXS and RNA-Seq data for cases from NCI’s MATCH precision medicine clinical trial (MATCH-Z1D; phs001859) and WGS, WXS, and RNA-Seq data for lung adenocarcinoma cases from NCI’s EAGLE epidemiologic study (CDDP_EAGLE-1; phs001239). New cases for Count Me In’s Metastatic Breast Cancer (CMI-MBC) project have also been added with WXS and RNA-Seq data.
New data sets are now available including RNA-Seq data from the TARGET acute myeloid leukemia project and single nuclei RNA-Seq data from the CPTAC program. These changes are summarized below:
- New RNA-Seq data from TARGET-AML for over 2,000 cases
- New snRNA-Seq data from CPTAC-3 for kidney cancer
The SomaticSniper whole exome variant calling pipeline has been deprecated. Because of this, SomaticSniper variants will now only be found in older versions of VCF or MAF files.
In Data Release 34, the GDC released new data from the Beat AML and Clinical Proteomic Tumor Analysis Consortium (CPTAC) programs:
In GDC Data Dictionary Release 2.5.0, the GDC added support for:
In Data Release 33, the GDC released data from two new projects at the GDC:
The gene model used as a reference across GDC has been updated from GENCODE 22 to GENCODE 36. GENCODE gene sets are continuously updated to improve the coverage and accuracy. GENCODE 36, which was released in October of 2020, includes many updates to definitions of genes, transcripts, long non-coding RNAs, and other types of annotations. The previous version used by the GDC (GENCODE 22) was released in March 2015. Both versions were built on Ensembl genome assembly GRCh38.
Highlights from the GDC's recent Data Release 30 include data from major studies exploring the effects of Chernobyl radiation and methylation arrays processed via a new pipeline at the GDC. The data release also includes a variety of data characterizing cervical cancer cases from NCI's CGCI HTMCP project, methylation data from NCI's Clinical Proteomic Tumor Analysis Consortium (CPTAC) program, and next-generation cancer models from NCI's Human Cancer Models Initiative (HCMI).
A recent data release offers new ways to access data from the Count Me In project. The nonprofit research initiative enables patients in the US and Canada to share their medical information and samples for cancer research. Notably, Count Me In has been able to sequence and share data from patients with very rare or understudied cancers such as angiosarcoma.
The GDC’s Data Release 26 features updates allowing Multiple Myeloma Research Foundation (MMRF) CoMMpass data to be explored directly in the GDC Data Portal. This behind-the-scenes change enables users to create mutation frequency, oncogrid, and other plots of multiple myeloma mutations readily in the web browser.
Also in this latest update:
The GDC’s latest data release includes two major National Cancer Institute projects: the HIV+ Tumor Molecular Characterization Project (HTMCP) and Cancer Proteomic Tumor Analysis Consortium (CPTAC).
The GDC has released data for over 44,000 cases from the American Association for Cancer Research’s Project Genomics Evidence Neoplasia Information Exchange (AACR Project GENIE, phs001337), including somatic variant calls, copy number estimates, and transcript fusions.
The data from this release comes from eight contributing centers:
In a major update to GDC's Data Portal, users can now plot and explore clinical data in the Analysis area. A new look and feel is also available for searching clinical metadata in the Exploration section. To learn more about these updates users can visit the GDC Data Portal Users Guide.
Data from the Multiple Myeloma Research Foundation (MMRF) is now available at the GDC. This is a major new data set with nearly 1000 patients with extensive molecular and clinical data, including longitudinal information collected over the course of disease for many patients. Other new projects in this latest data release include the Burkitt Lymphoma Genome Sequencing Project, TARGET Acute Lymphoblastic Leukemia (ALL) - Phases I and II, and Pancreas Cancer Organoid Profiling. New RNA-Seq data were also added for TARGET-ALL-P3, TARGET-CCSK, and TARGET-OS in the release.
CPTAC has provided the Genomic Data Commons (GDC) with genomic data from cancer patients with diverse disease types including Uterine Corpus Endometrial Carcinoma (UCEC), Clear Cell Renal Cell Carcinoma (CCRCC), and Lung Adenocarcinoma (LUAD). The GDC harmonized DNA sequences from CPTAC whole genome sequencing (WGS) and whole exomes sequencing (WXS) with the GRCh38 reference genome using GDC DNA-Seq Analysis Pipelines.
Users who have been granted access to Foundation Medicine data can now visualize somatic mutations using GDC Data Analysis, Visualization, and Exploration (DAVE) tools.
The GDC released a new version of the GDC Data Portal with new visualization tools for copy number variations (CNVs). CNVs, categorized as gains and losses, can now be visualized in conjunction with small-scale mutations (substitutions and short indels). In the Oncogrid, the colored grid is overlaid with symbols, allowing an integrated view of mutations and CNVs. Users can choose colors and which alteration types to view.
Data from molecular characterization of 574 Diffuse Large B-Cell Lymphoma (DLBCL) biopsy samples is now available in the GDC Data Portal. Published in the New England Journal of Medicine, Schmitz et al. performed whole-exome, transcriptome, deep amplicon resequencing, and DNA copy-number analyses. Alignment of the data to hg38 is available in the data portal. Mutation calls using the publication’s novel tumor-only pipeline and fusion genes detected from RNA-Seq data can be accessed on the publication page.
The NCI's Genomic Data Commons (GDC) released a new slide image viewing feature in the GDC Data Portal allowing researchers to view, zoom, and pan slide images associated with a case directly through the browser. Researchers can apply case filters to perform range searches for images by percent tumor cells and other criteria. In addition, slide images can also be downloaded in the original format (SVS) and are accessible via the GDC API.
The NCI's Genomic Data Commons (GDC) released new features allowing users to build and compare custom cohorts and perform operations on case, gene, or mutation sets. Users are now able to use the new GDC Analysis feature to build cohorts for selected cases and compare cohorts by performing survival analysis, and comparing characteristics such as gender, vital status and age at diagnosis. Users can also perform set operations on case, gene, or mutation sets by visualizing set similarities and differences in a Venn diagram.
FMI has provided the Genomic Data Commons (GDC) with genomic profiling data from approximately 18,000 adult patients with a diverse array of cancers that underwent genomic profiling using FoundationOne®, FMI's commercially available, comprehensive genomic profiling assay. FMI routinely analyzes cancer specimens using the advanced sequencing technology of FoundationOne.
The Analyzing Data using GDC Data Analysis, Visualization, and Exploration (DAVE) Tools webinar will help introduce users to GDC tools for analyzing data from cancer genomic studies.
The NCI's Genomic Data Commons (GDC) officially launched Data Analysis, Visualization, and Exploration (DAVE) tools transforming the GDC from a cancer genomics data repository into an interactive knowledge base. DAVE enables cancer researchers to use the GDC’s high-quality, standardized genomic data without downloading a single file. Researchers can:
The Querying and Downloading Data using the GDC Data Portal and the GDC Data Transfer Tool webinar will help introduce users to the GDC tools for downloading and retrieving data from cancer genomic studies. As an example, we will query and download open access data using the GDC Data Portal and the high performance GDC Data Transfer Tool. We will also review the process for obtaining access to controlled data and demonstrate how to generate a token for downloading controlled access data.
The Navigating the GDC - A Case Study webinar is the first webinar in a series of NCI GDC Webinar. This webinar will help introduce users to the different GDC tools and data types that are available to support cancer genomic analysis. As an example, we will identify common p53 mutations in colon cancer in the GDC cBioPortal, verify mutation calls using BAM slicing in the GDC Data Portal, and investigate the impact of mutations on RNA-Seq expression. In the process we will also highlight the GDC Data Transfer Tool, harmonized clinical data, and the GDC API.
The NCI announced a collaboration with the Multiple Myeloma Research Foundation (MMRF) to integrate MMRF's wealth of genomic and clinical data on the disease into the GDC. The MMRF is the first non-profit to donate information to the GDC and serves as a research and advocacy organization conducting clinical studies that incorporate whole-genome, whole-exome, and RNA sequencing into their study analyses.
Today the National Cancer Institute signed an agreement with Foundation Medicine, Inc. (FMI) that will grow the number of cancer cases represented at the GDC to well over 30,000 individuals. The agreement will make FMI-generated comprehensive genomic variant information for a set of cancer-associated genes from over 18,000 adult cancer cases available through the GDC to dbGaP-authorized users.
The NCI’s Genomic Data Commons (GDC) was officially launched on June 6th at the 2016 American Society of Clinical Oncology (ASCO) Annual Meeting by Vice President Joe Biden as part of the National Cancer Moonshot Initiative. The GDC is an interactive data sharing platform that enables the access, standardization, analysis, and submission of cancer genomic data in support of precision medicine.
The NCI’s Genomic Data Commons (GDC) project completes Phase 3 activities. In Phase 3, the GDC generates high level data including DNA-Seq derived germline variants and somatic mutations, RNA-Seq and miRNA-Seq derived gene and miRNA quantifications, and SNP Array based copy number segmentations.
The NCI's Genomic Data Commons (GDC) project completes Phase 2 activities. In Phase 2, the GDC provides support for data submission and harmonization to GRCh38, the latest reference genome build (GRCh38).
The NCI's Genomic Data Commons (GDC) project completes Phase 1 activities. In Phase 1, GDC provides support for TCGA datasets made accessible via the GDC Data Portal and the GDC Data Transfer Tool.
The Genomic Data Commons project will help researchers around the country assess genetic information from more than 10,000 cancer patients, which could be used to develop more effective treatments, said Robert Grossman, a professor of medicine at the University of Chicago who is directing the project.
The University of Chicago and the NCI are collaborating to establish the Genomic Data Commons (GDC). The GDC is a first-of-its-kind facility that will be the most comprehensive system to store data from NCI-funded research programs in a single repository, and harmonize them so they’re compatible. The GDC addresses a major issue in cancer research. A wealth of valuable tumor genome data has been collected by NCI-funded projects, but most researchers can’t make use of the material due to sheer size, disparate formats and dispersed storage locations.
The National Cancer Institute is establishing the NCI Genomic Data Commons (GDC) to store, analyze and distribute cancer genomics data generated by NCI and other research organizations. The GDC will provide an interactive system for researchers to access data, with the goal of advancing the molecular diagnosis of cancer and suggest potential therapeutic targets based on genomic information. The GDC is the first step toward the development of a knowledge system for cancer, as originally recommended in a 2011 Institute of Medicine (IOM) report, "Toward Precision Medicine."