Analyze Data

Analyze Data

The GDC provides user-friendly and interactive Data Analysis, Visualization, and Exploration (DAVE) Tools supporting gene and variant level analysis that allows researchers to:

  • Visualize most frequently mutated genes and view most frequent somatic mutations for a project
  • Plot all cases for a project in an OncoGrid and visualize the top 50 mutated genes affected by high impact mutations
  • Perform a survival analysis for cases with a mutated form of a certain gene and cases without the mutation
  • Visualize mutations and their frequency across cases mapped to a graphical visualization of protein-coding regions using an interactive Protein Viewer
  • View the cancer distribution as evidenced by the number of cases affected by the mutation across all projects
  • Build cohorts and perform gene and variant level analysis on the cohort
  • Compare custom gene or case sets by visualizing set similarities and differences
Illustration: Analyze Data

Data Analysis Tools

The GDC provides interactive tools supporting data analysis, exploration, and visualization.

Data Analysis Policy

The GDC provides policies for publishing the results of analyzed data.

Data Harmonization and Generation

GDC variant calling pipelines generate high level data for analysis. Variant calling pipelines are implemented using data processing software and algorithms selected in consultation with the expert genomics community.

What’s New with GDC and Cancer Research

Cancer Research Highlights and Publications:

From GDC FAQ

In the Most Frequent Mutations table for the VEP impact score, which algorithm in the VEP is the GDC using to determine “H" or “M”?

The IMPACT is categorized by the Sequencing Ontology type of the variants that is also compatible to snpEff. The VEP IMPACT rating is a separate rating given for compatibility with other variant annotation tools (e.g. snpEff). Basically, each category is associated with a set of SO terms:

  • HIGH: The variant is assumed to have high (disruptive) impact in the protein, probably causing protein truncation, loss of function or triggering nonsense mediated decay: transcript_ablation, splice_acceptor_variant, splice_donor_variant, stop_gained, frameshift_variant, stop_lost, start_lost, transcript_amplification
  • MODERATE: A non-disruptive variant that might change protein effectiveness: inframe_insertion, inframe_deletion, missense_variant, protein_altering_variant, regulatory_region_ablation
  • LOW: Assumed to be mostly harmless or unlikely to change protein behavior: splice_region_variant, incomplete_terminal_codon_variant, stop_retained_variant, synonymous_variant
  • MODIFIER: Usually non-coding variants or variants affecting non-coding genes, where predictions are difficult or there is no evidence of impact: coding_sequence_variant, mature_miRNA_variant, 5_prime_UTR_variant, 3_prime_UTR_variant, non_coding_transcript_exon_variant, intron_variant, NMD_transcript_variant, non_coding_transcript_variant, upstream_gene_variant, downstream_gene_variant, TFBS_ablation, TFBS_amplification, TF_binding_site_variant, regulatory_region_amplification, feature_elongation, regulatory_region_variant, feature_truncation, intergenic_variant

Details about predicted data in variations are available at ENSEMBL

Need Assistance?

Need help with data retrieval, download, or submission?