Main Content

Why do CNVs of different genes in the GDC Data Portal differ from other genomic portals?

Submitted by gaheens on

This discrepancy is due to differences in the data processing pipelines used by the GDC and other genomic portals. At the GDC, gene-level CNVs are derived from a mix of standardized pipelines. For TCGA projects, the CNV values are prioritized in the following order: SNP6 ABSOLUTE (LiftOver) > SNP6 ASCAT3 > WGS AscatNGS > SNP6 ASCAT2. All of these workflows produce absolute integer copy number values. 

In contrast, data from other genomic portals may come from more diverse sources. The same TCGA project may appear under different studies, with gene-level CNVs derived from the original publication segment mean values or data ingested directly from the GDC. The exact data origin and processing steps in other genomic portals can vary by study.

Subject Tag