Main Content

Analyze Data

The GDC provides an array of interactive, web-based Analysis Tools for performing in-depth gene- and variant- level analyses. The workflow is cohort-centric, meaning analyses are specific to a researcher's cohort of interest.

  • Build cohorts and perform gene and variant level analysis on the cohort
  • Visualize most frequently mutated genes and somatic mutations for a cohort
  • Visualize a matrix of the top most mutated cases and genes affected by high impact mutations in a cohort
  • Perform a survival analysis for cases with a mutated form of a certain gene and cases without the mutation
  • Visualize mutations and their frequency across cases mapped to a graphical visualization of protein-coding regions
  • Visualize the top most variably expressed genes in a cohort
  • Visualize sequencing reads for a given gene, position, SNP, or variant
  • Use clinical variables to perform basic statistical analysis of a cohort
  • Compare custom gene or case sets by visualizing set similarities and differences

Documentation

Data Analysis Tools

The GDC provides interactive, cohort-centric tools for analyzing genomic and clinical data.

Data Analysis Policy

Policies and guidelines for appropriate use of data, are provided by the GDC whether open- or controlled- access.

Data Harmonization and Generation

The GDC developers best-in-practice pipelines for processing the most common molecular platforms. Variant calling, gene expression analysis, and other pipelines are implemented using software and algorithms selected in consultation with experts in the genomics community.

What’s New with GDC and Cancer Research

Cancer Research Highlights and Publications:

From the GDC FAQ

How are the five categories of copy number changes determined?

The GDC begins with integer-level estimates of absolute copy number generated by either the ASCAT or ABSOLUTE pipeline. To establish a baseline, an integer-valued sample ploidy is computed as follows: 

  • For gene-level CNV, the mode of copy number values is used across all autosomal protein-coding genes
  • For segment-level CNV, a length-weighted mode of copy number values is computed across all autosomal segments
  • In cases of a tie, the mode is rounded up
  • Please note that the integer-valued sample ploidy used here differs from the floating-point ploidy estimates produced directly by the ASCAT or ABSOLUTE pipelines. The latter should be considered the more precise representation and is recommended for use in most other bioinformatics analyses.


Based on this sample ploidy value, the GDC assigns copy number categories as: 

  • Homozygous deletion: copy number = 0 Loss: 0 < copy number < sample ploidy 
  • Neutral: copy number = sample ploidy 
  • Gain: sample ploidy < copy number < 2 × sample ploidy 
  • Amplification: copy number ≥ 2 × sample ploidy
Need Assistance?

Need help with data retrieval, download, or submission?

Visit the GDC Support Page