Analyze Data

Analyze Data

The GDC provides user-friendly and interactive Data Analysis, Visualization, and Exploration (DAVE) Tools supporting gene and variant level analysis that allows researchers to:

  • Visualize most frequently mutated genes and view most frequent somatic mutations for a project
  • Plot all cases for a project in an OncoGrid and visualize the top 50 mutated genes affected by high impact mutations
  • Perform a survival analysis for cases with a mutated form of a certain gene and cases without the mutation
  • Visualize mutations and their frequency across cases mapped to a graphical visualization of protein-coding regions using an interactive Protein Viewer
  • View the cancer distribution as evidenced by the number of cases affected by the mutation across all projects
  • Build cohorts and perform gene and variant level analysis on the cohort
  • Compare custom gene or case sets by visualizing set similarities and differences
Illustration: Analyze Data

Data Analysis Tools

The GDC provides interactive tools supporting data analysis, exploration, and visualization.

Data Analysis Policy

The GDC provides policies for publishing the results of analyzed data.

Data Harmonization and Generation

GDC variant calling pipelines generate high level data for analysis. Variant calling pipelines are implemented using data processing software and algorithms selected in consultation with the expert genomics community.

What’s New with GDC and Cancer Research

Cancer Research Highlights and Publications:


Why did the GDC remove HTSeq for gene expression quantification?

HTSeq had been the default RNA-Seq expression quantification tool since the first GDC data release. The GDC later updated the RNA-Seq alignment and quantification workflow to include STAR Count, which generates stranded counts by default in addition to the existing unstranded counts. During Data Release 32 for gene model updates, the GDC had 1) augmented the existing STAR Count output to include FPKM and FPKM-UQ normalizations; 2) reprocessed all the TCGA data using the latest RNA-Seq workflow with STAR Count. Because both tools use very similar counting strategies, and STAR Count has the advantages in both running time and the additional stranded counts, the GDC removed HTSeq workflow in Data Release 32.

Need Assistance?

Need help with data retrieval, download, or submission?