Main Content

Analyze Data

The GDC provides an array of interactive, web-based Analysis Tools for performing in-depth gene- and variant- level analyses. The workflow is cohort-centric, meaning analyses are specific to a researcher's cohort of interest.

  • Build cohorts and perform gene and variant level analysis on the cohort
  • Visualize most frequently mutated genes and somatic mutations for a cohort
  • Visualize a matrix of the top most mutated cases and genes affected by high impact mutations in a cohort
  • Perform a survival analysis for cases with a mutated form of a certain gene and cases without the mutation
  • Visualize mutations and their frequency across cases mapped to a graphical visualization of protein-coding regions
  • Visualize the top most variably expressed genes in a cohort
  • Visualize sequencing reads for a given gene, position, SNP, or variant
  • Use clinical variables to perform basic statistical analysis of a cohort
  • Compare custom gene or case sets by visualizing set similarities and differences

Documentation

Data Analysis Tools

The GDC provides interactive, cohort-centric tools for analyzing genomic and clinical data.

Data Analysis Policy

Policies and guidelines for appropriate use of data, are provided by the GDC whether open- or controlled- access.

Data Harmonization and Generation

The GDC developers best-in-practice pipelines for processing the most common molecular platforms. Variant calling, gene expression analysis, and other pipelines are implemented using software and algorithms selected in consultation with experts in the genomics community.

What’s New with GDC and Cancer Research

Cancer Research Highlights and Publications:

From the GDC FAQ

In the Most Frequent Mutations table for the VEP impact score, which algorithm in the VEP is the GDC using to determine “H" or “M”?

The IMPACT is categorized by the Sequencing Ontology type of the variants that is also compatible to snpEff. The VEP IMPACT rating is a separate rating given for compatibility with other variant annotation tools (e.g. snpEff). Basically, each category is associated with a set of SO terms:

  • HIGH: The variant is assumed to have high (disruptive) impact in the protein, probably causing protein truncation, loss of function or triggering nonsense mediated decay: transcript_ablation, splice_acceptor_variant, splice_donor_variant, stop_gained, frameshift_variant, stop_lost, start_lost, transcript_amplification
  • MODERATE: A non-disruptive variant that might change protein effectiveness: inframe_insertion, inframe_deletion, missense_variant, protein_altering_variant, regulatory_region_ablation
  • LOW: Assumed to be mostly harmless or unlikely to change protein behavior: splice_region_variant, incomplete_terminal_codon_variant, stop_retained_variant, synonymous_variant
  • MODIFIER: Usually non-coding variants or variants affecting non-coding genes, where predictions are difficult or there is no evidence of impact: coding_sequence_variant, mature_miRNA_variant, 5_prime_UTR_variant, 3_prime_UTR_variant, non_coding_transcript_exon_variant, intron_variant, NMD_transcript_variant, non_coding_transcript_variant, upstream_gene_variant, downstream_gene_variant, TFBS_ablation, TFBS_amplification, TF_binding_site_variant, regulatory_region_amplification, feature_elongation, regulatory_region_variant, feature_truncation, intergenic_variant

Details about predicted data in variations are available at ENSEMBL

Need Assistance?

Need help with data retrieval, download, or submission?

Visit the GDC Support Page