Analyze Data

Analyze Data

The GDC provides user-friendly and interactive Data Analysis, Visualization, and Exploration (DAVE) Tools supporting gene and variant level analysis that allows researchers to:

  • Visualize most frequently mutated genes and view most frequent somatic mutations for a project
  • Plot all cases for a project in an OncoGrid and visualize the top 50 mutated genes affected by high impact mutations
  • Perform a survival analysis for cases with a mutated form of a certain gene and cases without the mutation
  • Visualize mutations and their frequency across cases mapped to a graphical visualization of protein-coding regions using an interactive Protein Viewer
  • View the cancer distribution as evidenced by the number of cases affected by the mutation across all projects
  • Build cohorts and perform gene and variant level analysis on the cohort
  • Compare custom gene or case sets by visualizing set similarities and differences
Illustration: Analyze Data

Data Analysis Tools

The GDC provides interactive tools supporting data analysis, exploration, and visualization.

Data Analysis Policy

The GDC provides policies for publishing the results of analyzed data.

Data Harmonization and Generation

GDC variant calling pipelines generate high level data for analysis. Variant calling pipelines are implemented using data processing software and algorithms selected in consultation with the expert genomics community.

What’s New with GDC and Cancer Research

Cancer Research Highlights and Publications:


Why are the number of analyzed cases in the MAF header not equal to the number of cases displayed in the GDC Data Portal?

Within the GDC data analysis workflow, both public (somatic) MAFs and protected MAFs generated are from the same pipeline and link back to the same cases. For example, For the TCGA-GBM project, the somatic MAF has the following header:

# in TCGA.GBM.muse.7e85de23-3855-4279-a3ac-a81827e4ccb6.DR6.0.somatic.maf.gz
#version gdc-1.0.0
#filedate 20170307
#n.analyzed.samples 393

In general, n.analyzed.samples is used as a denominator to calculate mutation frequencies. If no variants for a case passed our filters, the case should still be counted; however, if the case was determined to have poor quality (such as for high contamination, duplicates etc.), it is not counted in the public MAF. In this particular project (TCGA-GBM), there were 396 cases with SNV data. Our analysis pipeline revealed that among them, a total of 5 GBM tumor aliquots had high contamination. Among these 5 patient, 2 had another good tumor aliquot, but 3 had only one aliquot. As the result, those 3 cases were removed from the public MAF.

Need Assistance?

Need help with data retrieval, download, or submission?