DNA-Seq Somatic Variation

The GDC generates somatic DNA mutation calls from DNA-Seq data of tumor tissues. As somatic variant detection from tumor tissues is a complicated process, there is no consensus among the scientific community on the best variant calling algorithm. Thus, GDC implements multiple callers that generate more than one set of variant calling output for the users.

At the initial stage, the GDC focused on somatic Single Nucleotide Variant (SNV), and Insertion and Deletion (INDEL). The GDC will expand somatic variant detection efforts to Structural Variant (SV) and Copy Number Variant (CNV). Currently, the variant calling pipelines implemented by the GDC are:

  • MuSE [1] somatic point mutation calling pipeline

MuSE Somatic Variant Calling Pipeline

  • VarScan [2] variant calling pipeline

VarScan Somatic Variant Calling Pipeline

SomaticSniper Variant Calling Pipeline

  • MuTect [4] SNV and INDEL calling pipeline

MuTect Variant Calling Pipeline

  • Pindel SNV and INDEL calling pipeline

Pindel Pipeline

In order to improve calling quality, the GDC realigns the paired tumor-normal BAMs jointly and recalibrates base quality scores using tools from the Genome Analysis Toolkit (GATK) before these files are used for variant calling. Please refers to the GATK Best Practices for details.

The GDC intends to make ongoing updates of these pipelines with input from the scientific community.

In addition to data produced by Illumina platforms, most of the Whole Genome Sequencing data in TARGET were generated with Complete Genomics instruments (CGI). Initial variant calls from these CGI data were produced by Complete Genomics proprietary methodology on the last human reference genome build GRCh37. The GDC applies Batch Coordinate Conversion, or "liftOver", to convert genome coordinates and annotation of these original variant calls to variants on the GRCh38 reference build.

[1]. Fan, Y., Xi, L., Hughes, D. S. T., Zhang, J., Zhang, J., Futreal, P. A., Wheeler, D. A., and Wang, W. Accounting for inter-tumor heterogeneity using a sample-specific error model improves sensitivity and specificity in mutation calling for sequencing data. in submission.

[2]. Koboldt DC, Zhang Q, Larson DE, Shen D, McLellan MD, Lin L, Miller CA, Mardis ER, Ding L, Wilson RK. Genome Res. 2012 Mar;22(3):568-76. doi: 10.1101/gr.129684.111. Epub 2012 Feb 2. VarScan 2: somatic mutation and copy number alteration discovery in cancer by exome sequencing.

[3]. Larson DE, Harris CC, Chen K, Koboldt DC, Abbott TE, Dooling DJ, Ley TJ, Mardis ER, Wilson RK, Ding L. Bioinformatics. 2012 Feb 1;28(3):311-7. doi: 10.1093/bioinformatics/btr665. Epub 2011 Dec 6. SomaticSniper: identification of somatic point mutations in whole genome sequencing data.

[4]. Cibulskis K, Lawrence MS, Carter SL, Sivachenko A, Jaffe D, Sougnez C, Gabriel S, Meyerson M, Lander ES, Getz G. Nat Biotechnol. 2013 Mar;31(3):213-9. doi: 10.1038/nbt.2514. Epub 2013 Feb 10. Sensitive detection of somatic point mutations in impure and heterogeneous cancer samples.