Main Content

News and Announcements

FILTERS

GDC Data Dictionary

New GDC Data Dictionary properties were added in the Okazaki release. Key highlights include:


GDC Data Release 41 is now accessible on the GDC Data Portal. This release introduces new data sets featuring data for new NCI-MATCH Trial arms, whole slide images, and more. Below is a summary of the key highlights. 


The GDC Nightingale release brings an upgrade to the GDC Data Transfer Tool (DTT) client and an Application Programming Interface (API) update to add an endpoint for Other Clinical Data Elements with data soon to follow.


Introducing the latest advancements in GDC 2.0 with Release 2.1! Here's what's new:


GDC Data Release 40 is now accessible on the GDC Data Portal. This release introduces new data sets featuring additional TCGA WGS alignments and variant calls, WXS and RNA-Seq data for new NCI-MATCH Trial arms, and more. Below is a summary of the key highlights.


The GDC has released GDC 2.0 which expands on the GDC Data Portal initially launched in 2016 by providing a cohort-centric design and new analysis tools. GDC 2.0 features include:


GDC Data Release 39 is now available on the GDC Data Portal. Several main features for this release include new TCGA WGS variants and additional higher coverage alignments, and five new projects from NCI’s MATCH program:


Introducing a new node for incorporating additional clinical attributes, plus new properties and enumerated values to support The Cancer Genome Atlas (TCGA) and other NCI programs in GDC Data Dictionary Release 3.0.0:


The GDC’s Data Release 38 includes 9000+ high coverage TCGA WGS alignments, data for new CGCI, MATCH, and MP2PRT projects, and a variety of data from other NCI programs:


In GDC Data Dictionary Release 2.6.6, the GDC added support for:


The Genomic Data Commons strives to provide standardized and relevant genomic data to the cancer research community. While data processed according to the most up-to-date pipelines is provided to users through the GDC Data Portal, a number of files processed via older or external pipelines had been kept available for users through the GDC Legacy Archive.


The GDC’s Data Release 37 includes four new projects: APOLLO-LUAD - Proteogenomic Characterization of Lung Adenocarcinoma, CGCI-HTMCP-LC - HIV+ Tumor Molecular Characterization of Lung Cancer, and two treatment arms from NCI's MATCH Clinical Trial: MATCH-Q - Treatment Arm Q (tumors with AKT mutations) and MATCH-Y - Treatment Arm Y (tumors with HER2 amplification).


In GDC Data Dictionary Release 2.6.0, the GDC added support for:


GDC Data Release 36 includes WXS and RNA-Seq data for cases from NCI’s MATCH precision medicine clinical trial (MATCH-Z1D; phs001859) and WGS, WXS, and RNA-Seq data for lung adenocarcinoma cases from NCI’s EAGLE epidemiologic study (CDDP_EAGLE-1; phs001239). New cases for Count Me In’s Metastatic Breast Cancer (CMI-MBC) project have also been added with WXS and RNA-Seq data.


New data sets are now available including RNA-Seq data from the TARGET acute myeloid leukemia project and single nuclei RNA-Seq data from the CPTAC program. These changes are summarized below:

  • New RNA-Seq data from TARGET-AML for over 2,000 cases
  • New snRNA-Seq data from CPTAC-3 for kidney cancer

The SomaticSniper whole exome variant calling pipeline has been deprecated. Because of this, SomaticSniper variants will now only be found in older versions of VCF or MAF files.


In Data Release 34, the GDC released new data from the Beat AML and Clinical Proteomic Tumor Analysis Consortium (CPTAC) programs:


In GDC Data Dictionary Release 2.5.0, the GDC added support for:


In Data Release 33, the GDC released data from two new projects at the GDC:


The gene model used as a reference across GDC has been updated from GENCODE 22 to GENCODE 36. GENCODE gene sets are continuously updated to improve the coverage and accuracy. GENCODE 36, which was released in October of 2020, includes many updates to definitions of genes, transcripts, long non-coding RNAs, and other types of annotations. The previous version used by the GDC (GENCODE 22) was released in March 2015. Both versions were built on Ensembl genome assembly GRCh38.


Highlights from the GDC's recent Data Release 30 include data from major studies exploring the effects of Chernobyl radiation and methylation arrays processed via a new pipeline at the GDC. The data release also includes a variety of data characterizing cervical cancer cases from NCI's CGCI HTMCP project, methylation data from NCI's Clinical Proteomic Tumor Analysis Consortium (CPTAC) program, and next-generation cancer models from NCI's Human Cancer Models Initiative (HCMI).


A recent data release offers new ways to access data from the Count Me In project. The nonprofit research initiative enables patients in the US and Canada to share their medical information and samples for cancer research. Notably, Count Me In has been able to sequence and share data from patients with very rare or understudied cancers such as angiosarcoma.


The GDC’s Data Release 26 features updates allowing Multiple Myeloma Research Foundation (MMRF) CoMMpass data to be explored directly in the GDC Data Portal. This behind-the-scenes change enables users to create mutation frequency, oncogrid, and other plots of multiple myeloma mutations readily in the web browser.

Also in this latest update:


The GDC’s latest data release includes two major National Cancer Institute projects: the HIV+ Tumor Molecular Characterization Project (HTMCP) and Cancer Proteomic Tumor Analysis Consortium (CPTAC).


The GDC has released data for over 44,000 cases from the American Association for Cancer Research’s Project Genomics Evidence Neoplasia Information Exchange (AACR Project GENIE, phs001337), including somatic variant calls, copy number estimates, and transcript fusions.

The data from this release comes from eight contributing centers:


Data from the Multiple Myeloma Research Foundation (MMRF) is now available at the GDC. This is a major new data set with nearly 1000 patients with extensive molecular and clinical data, including longitudinal information collected over the course of disease for many patients. Other new projects in this latest data release include the Burkitt Lymphoma Genome Sequencing Project, TARGET Acute Lymphoblastic Leukemia (ALL) - Phases I and II, and Pancreas Cancer Organoid Profiling. New RNA-Seq data were also added for TARGET-ALL-P3, TARGET-CCSK, and TARGET-OS in the release.


CPTAC has provided the Genomic Data Commons (GDC) with genomic data from cancer patients with diverse disease types including Uterine Corpus Endometrial Carcinoma (UCEC), Clear Cell Renal Cell Carcinoma (CCRCC), and Lung Adenocarcinoma (LUAD). The GDC harmonized DNA sequences from CPTAC whole genome sequencing (WGS) and whole exomes sequencing (WXS) with the GRCh38 reference genome using GDC DNA-Seq Analysis Pipelines.


Data from molecular characterization of 574 Diffuse Large B-Cell Lymphoma (DLBCL) biopsy samples is now available in the GDC Data Portal. Published in the New England Journal of Medicine, Schmitz et al. performed whole-exome, transcriptome, deep amplicon resequencing, and DNA copy-number analyses. Alignment of the data to hg38 is available in the data portal. Mutation calls using the publication’s novel tumor-only pipeline and fusion genes detected from RNA-Seq data can be accessed on the publication page.


The NCI's Genomic Data Commons (GDC) released a new slide image viewing feature in the GDC Data Portal allowing researchers to view, zoom, and pan slide images associated with a case directly through the browser. Researchers can apply case filters to perform range searches for images by percent tumor cells and other criteria. In addition, slide images can also be downloaded in the original format (SVS) and are accessible via the GDC API.


The Querying and Downloading Data using the GDC Data Portal and the GDC Data Transfer Tool webinar will help introduce users to the GDC tools for downloading and retrieving data from cancer genomic studies. As an example, we will query and download open access data using the GDC Data Portal and the high performance GDC Data Transfer Tool. We will also review the process for obtaining access to controlled data and demonstrate how to generate a token for downloading controlled access data.


The Navigating the GDC - A Case Study webinar is the first webinar in a series of NCI GDC Webinar. This webinar will help introduce users to the different GDC tools and data types that are available to support cancer genomic analysis. As an example, we will identify common p53 mutations in colon cancer in the GDC cBioPortal, verify mutation calls using BAM slicing in the GDC Data Portal, and investigate the impact of mutations on RNA-Seq expression. In the process we will also highlight the GDC Data Transfer Tool, harmonized clinical data, and the GDC API.