Main Content

Analyze Data

The GDC provides an array of interactive, web-based Analysis Tools for performing in-depth gene- and variant- level analyses. The workflow is cohort-centric, meaning analyses are specific to a researcher's cohort of interest.

  • Build cohorts and perform gene and variant level analysis on the cohort
  • Visualize most frequently mutated genes and somatic mutations for a cohort
  • Visualize a matrix of the top most mutated cases and genes affected by high impact mutations in a cohort
  • Perform a survival analysis for cases with a mutated form of a certain gene and cases without the mutation
  • Visualize mutations and their frequency across cases mapped to a graphical visualization of protein-coding regions
  • Visualize the top most variably expressed genes in a cohort
  • Visualize sequencing reads for a given gene, position, SNP, or variant
  • Use clinical variables to perform basic statistical analysis of a cohort
  • Compare custom gene or case sets by visualizing set similarities and differences

Documentation

Data Analysis Tools

The GDC provides interactive, cohort-centric tools for analyzing genomic and clinical data.

Data Analysis Policy

Policies and guidelines for appropriate use of data, are provided by the GDC whether open- or controlled- access.

Data Harmonization and Generation

The GDC developers best-in-practice pipelines for processing the most common molecular platforms. Variant calling, gene expression analysis, and other pipelines are implemented using software and algorithms selected in consultation with experts in the genomics community.

What’s New with GDC and Cancer Research

Cancer Research Highlights and Publications:

From the GDC FAQ

How often does the GDC update the workflow/reference genome? If the GDC updates the workflow/reference genome, does the GDC re-process all data sets?

For the reference genome, the GDC has been using an augmented version of GRCh38.p2 (with additional decoy sequences and virus sequences) since inception. The GDC does not use alternative contigs, and only derives high-level data from the major chromosomes, so the same reference genome is used for both gene model GENCODE v22 (from Data Release 1 to 31) and GENCODE v36 (from Data Release 32). As future versions of the reference genome are released, e.g., GRCh39, the GDC will evaluate the benefits of updating data to utilize the new version. By updating the reference genome, the GDC would expect to re-process all data sets. For information on the reference genome used by the GDC, please refer to the GDC Reference Files.

For workflow updates, the GDC prefers to keep the workflow stable, and will not update unless there are necessary updates such as updates of the reference genome or gene model, or major algorithm updates in the tools that could result significant changes in the generated data. When workflow updates are actually needed, the GDC categorizes them as either major updates or minor updates depending on whether the update significantly affects the output data. The GDC will re-process all existing data sets in major workflow updates, and such examples include transitioning the RNA-Seq genomic BAM alignment workflow into a new version that generates three BAMs and STAR counts; and updating the MAF workflow to add additional functions to the MAF files. Minor updates mostly happen to resolve bugs, security issues, and/or compatibility issues. For example, the GDC DNA-Seq alignment workflow has been updated several times to address quality issues from various submitted data; however, because the main alignment algorithm remains almost the same, the GDC does not need to re-process all the data sets for these minor updates.

Need Assistance?

Need help with data retrieval, download, or submission?

Visit the GDC Support Page