Main Content

Analyze Data

Why do CNVs of different genes in the GDC Data Portal differ from other genomic portals?

Submitted by gaheens on

This discrepancy is due to differences in the data processing pipelines used by the GDC and other genomic portals. At the GDC, gene-level CNVs are derived from a mix of standardized pipelines. For TCGA projects, the CNV values are prioritized in the following order: SNP6 ABSOLUTE (LiftOver) > SNP6 ASCAT3 > WGS AscatNGS > SNP6 ASCAT2. All of these workflows produce absolute integer copy number values. 

How does the GDC choose the default transcript for each variant?

Submitted by gaheens on

When a mutation overlaps multiple transcripts or genes, the GDC annotates all consequences in the all_effects column of the MAF file and in the CONSEQUENCE table on the Mutation Summary Page. One transcript is then selected as the default for detailed annotation and visualization where a single consequence is shown. 

Why do some genes show no expression in STAR results across all samples, even though I can see mapped reads in the raw RNA-Seq data?

Submitted by gaheens on

STAR gene expression quantification excludes reads that are mapped to multiple different genes. This can cause some genes to appear with zero expression in the final counts, even if mapped reads are present in the raw data. 

Subscribe to Analyze Data