Main Content

Analyze Data

The GDC provides an array of interactive, web-based Analysis Tools for performing in-depth gene- and variant- level analyses. The workflow is cohort-centric, meaning analyses are specific to a researcher's cohort of interest.

  • Build cohorts and perform gene and variant level analysis on the cohort
  • Visualize most frequently mutated genes and somatic mutations for a cohort
  • Visualize a matrix of the top most mutated cases and genes affected by high impact mutations in a cohort
  • Perform a survival analysis for cases with a mutated form of a certain gene and cases without the mutation
  • Visualize mutations and their frequency across cases mapped to a graphical visualization of protein-coding regions
  • Visualize the top most variably expressed genes in a cohort
  • Visualize sequencing reads for a given gene, position, SNP, or variant
  • Use clinical variables to perform basic statistical analysis of a cohort
  • Compare custom gene or case sets by visualizing set similarities and differences

Documentation

Data Analysis Tools

The GDC provides interactive, cohort-centric tools for analyzing genomic and clinical data.

Data Analysis Policy

Policies and guidelines for appropriate use of data, are provided by the GDC whether open- or controlled- access.

Data Harmonization and Generation

The GDC developers best-in-practice pipelines for processing the most common molecular platforms. Variant calling, gene expression analysis, and other pipelines are implemented using software and algorithms selected in consultation with experts in the genomics community.

What’s New with GDC and Cancer Research

Cancer Research Highlights and Publications:

From the GDC FAQ

Why do some genes show no expression in STAR results across all samples, even though I can see mapped reads in the raw RNA-Seq data?

STAR gene expression quantification excludes reads that are mapped to multiple different genes. This can cause some genes to appear with zero expression in the final counts, even if mapped reads are present in the raw data. 

One common reason for this is gene overlap. These genes often have their exons entirely encompassed within other genes, and in such cases, STAR cannot assign reads to them because they are ambiguous. To check if a gene falls into this category, you can refer to the following lists: Stranded Counting Overlap Gene List: overlap.gene.stranded.tsv, and Strandless Counting Overlap Gene List: overlap.gene.strandless.tsv.

Need Assistance?

Need help with data retrieval, download, or submission?

Visit the GDC Support Page