Main Content

Why is the data maintained in cBioPortal, Broad Firehose, or the Seven Bridges Cancer Genomics Cloud different from the GDC data?

Submitted by Anonymous on

The GDC harmonizes data across projects. This includes aligning the genomic data to a common reference genome (HG38) and generating higher level data using GDC bioinformatics pipelines. Other repositories may process the data differently.

For example, TCGA data in cBioPortal uses the original mutation data generated by the individual TCGA sequencing centers. The source of the data is the Broad Firehose (or the publication pages for data that matches a specific manuscript). These data are usually a combination of two mutation callers, but they differ by center (typically a variant caller like MuTect plus an indel caller), and sequencing centers have modified their mutation calling pipelines over time. TCGA data in the GDC is harmonized with the latest reference genome (GRCh38). Mutations are called using four variant callers: MuTect, VarScan2, MuSE, and Pindel.

Subject Tag