Main Content

Analyze Data

In the OncoGrid, why are there less cases than there are cases listed as having mutations?

Submitted by Anonymous on

The cases in the OncoGrid are filtered by consequence type. Only cases that have mutations that have consequence types of: {missense_variant, frameshift_variant, start_lost, stop_lost, initiator_codon_variant, stop_gained} are displayed in the OncoGrid.

Why does the GDC display common genes such as TTN that are associated with every cancer in the most frequently mutated genes table?

Submitted by Anonymous on

The GDC is not normalizing frequency by gene length. This is currently under discussion. As such, these genes are appearing in the mutated genes table. Users can filter by the COSMIC Cancer Gene Census to display only genes for which mutations have been causally implicated in cancer.

Why are the number of analyzed cases in the MAF header not equal to the number of cases displayed in the GDC Data Portal?

Submitted by Anonymous on

Within the GDC data analysis workflow, both public (somatic) MAFs and protected MAFs generated are from the same pipeline and link back to the same cases. For example, For the TCGA-GBM project, the somatic MAF has the following header:

# in TCGA.GBM.muse.7e85de23-3855-4279-a3ac-a81827e4ccb6.DR6.0.somatic.maf.gz
#version gdc-1.0.0
#filedate 20170307
#n.analyzed.samples 393

On the GDC Project summary page or Exploration/Gene tab, why are the # of Mutations sometimes less than the # Affected Cases?

Submitted by Anonymous on

The “# Mutations” column in the Project or Exploration/Gene tab displays the number of distinct (unique) mutations within the affected cases and not necessarily the total number of all mutations within the project or query filter.

Can I use the GDC Application Programming Interface (API) to retrieve data sets associated with visualizations?

Submitted by Anonymous on

Yes. The GDC provides additional analysis endpoints to retrieve data sets associated with visualizations. Analysis endpoints include: survival, top_cases_counts_by_genes, top_mutated_genes_by_project, top_mutated_cases_by_gene, top_mutated_cases_by_ssm, and mutated_cases_count_by_project.

Please refer to the GDC API User's Guide Analysis Section for additional information.

In the Most Frequent Mutations table for the VEP impact score, which algorithm in the VEP is the GDC using to determine “H" or “M”?

Submitted by Anonymous on

The IMPACT is categorized by the Sequencing Ontology type of the variants that is also compatible to snpEff. The VEP IMPACT rating is a separate rating given for compatibility with other variant annotation tools (e.g. snpEff). Basically, each category is associated with a set of SO terms:

Subscribe to Analyze Data