Analyze Data

The GDC provides an array of interactive, web-based Analysis Tools for performing in-depth gene- and variant- level analyses. The workflow is cohort-centric, meaning analyses are specific to a researcher's cohort of interest.

Build cohorts and perform gene and variant level analysis on the cohort
Visualize most frequently mutated genes and somatic mutations for a cohort
Visualize a matrix of the top most mutated cases and genes affected by high impact mutations in a cohort
Perform a survival analysis for cases with a mutated form of a certain gene and cases without the mutation
Visualize mutations and their frequency across cases mapped to a graphical visualization of protein-coding regions
Visualize the top most variably expressed genes in a cohort
Visualize sequencing reads for a given gene, position, SNP, or variant
Use clinical variables to perform basic statistical analysis of a cohort
Compare custom gene or case sets by visualizing set similarities and differences

Explore the GDC Data Analysis Processes and Tools »

Illustration: Analyze Data

Documentation

GDC Data Portal User's Guide »

Data Analysis Tools

The GDC provides interactive, cohort-centric tools for analyzing genomic and clinical data.

Data Analysis Policy

Policies and guidelines for appropriate use of data, are provided by the GDC whether open- or controlled- access.

Data Analysis Policies

Data Harmonization and Generation

The GDC developers best-in-practice pipelines for processing the most common molecular platforms. Variant calling, gene expression analysis, and other pipelines are implemented using software and algorithms selected in consultation with experts in the genomics community.

More about Data Harmonization and Generation

What’s New with GDC and Cancer Research

Cancer Research Highlights and Publications:

From the GDC FAQ

How does the GDC choose the default transcript for each variant?

When a mutation overlaps multiple transcripts or genes, the GDC annotates all consequences in the all_effects column of the MAF file and in the CONSEQUENCE table on the Mutation Summary Page. One transcript is then selected as the default for detailed annotation and visualization where a single consequence is shown.

The default is chosen based on annotations from the Variant Effect Predictor (VEP), prioritizing the most severe consequence on the most impactful transcript biotype (See: selected annotation for the 'OneEffect'). The GDC also applies a curated transcript override file from MSKCC, which defines preferred transcripts for key genes, along with consideration for canonical and longest transcripts.

During GENCODE v36 updates in GDC Data Release 32 (DR 32), Some hotspot mutations may display a different default consequence annotation. For example, BRAF V600E is shown as BRAF V640E after DR32 because the curated BRAF transcript ENST00000288602 was updated by GENCODE with an addition of 40 amino acids at the N-terminus. Changes like this affect the default consequence annotations for all BRAF mutations and may similarly impact other genes with updated transcript models. Although the default annotation changed, users can still find V600E listed in the all_effects column in MAF files or the CONSEQUENCE table on the Mutation Summary Page alongside other transcript annotations.

Need Assistance?

Need help with data retrieval, download, or submission?

Visit the GDC Support Page