Frontiers in Genetics, April 27, 2022; 10.3389/fgene.2022.834764
Formalin fixation of paraffin-embedded tissue samples is a well-established method for preserving tissue and is routinely used in clinical settings. Although formalin-fixed, paraffin-embedded (FFPE) tissues are deemed crucial for research and clinical applications, the fixation process results in molecular damage to nucleic acids, thus confounding their use in genome sequence analysis. Methods to improve genomic data quality from FFPE tissues have emerged, but there remains significant room for improvement.
Here, we use whole-genome sequencing (WGS) data from matched Fresh Frozen (FF) and FFPE tissue samples to optimize a sensitive and precise FFPE single nucleotide variant (SNV) calling approach. We present methods to reduce the prevalence of false-positive SNVs by applying combinatorial techniques to five publicly available variant callers. We also introduce FFPolish, a novel variant classification method that efficiently classifies FFPE-specific false-positive variants.
Our combinatorial and statistical techniques improve precision and F1 scores compared to the results of publicly available tools when tested individually.
Keywords:Burkitt Lymphoma, Cervical Cancer, mRNA-Seq, miRNA-Seq, WGS, FFPE
In order to access controlled CGCI BLGSP and CGCI HTMCP-CC data, users must submit an application via dbGaP. To begin the application process, please view the information provided on the dbGaP Authorized Access Login Page under "dbGaP Data Download.
Supplemental Data
- GDC Manifests
- CGCI-BLGSP Controlled-Access Data Download Manifest (3 Files)
- CGCI-BLGSP Open-Access Data Download Manifest (8 Files)
- CGCI HTMCP-CC Controlled-Access Data Download Manifest (5 Files)
- CGCI HTMCP-CC Open-Access Data Download Manifest (7 Files)
- Clinical Data
- mRNA
- CGCI-BLGSP mRNA-Seq Level 3 Data
- CGCI-BLGSP mRNA-Seq Level 3 Data (Controlled)
- CGCI-BLGSP mRNA-Seq Metadata
- CGCI HTMCP-CC mRNA-Seq Level 3 Data
- CGCI HTMCP-CC mRNA-Seq Level 3 Data (Controlled)
- Targeted Capture Sequencing
- CGCI-BLGSP Targeted Capture Sequencing Level 3 Data (Controlled)
- CGCI-BLGSP Targeted Capture Sequencing Design
- CGCI-BLGSP Targeted Capture Sequencing Metadata
- CGCI HTMCP-CC Targeted Capture Sequencing Level 3 Data (Controlled)
- CGCI HTMCP-CC Targeted Capture Sequencing Metadata
- Whole Genome Sequencing
- CGCI-BLGSP WGS Level 2 Data
- CGCI-BLGSP WGS Level 3 Data (Controlled)
- CGCI-BLGSP WGS Metadata
- CGCI HTMCP-CC WGS Level 3 Data (Controlled)
- CGCI HTMCP-CC ChiP-Seq
- CGCI HTMCP-CC ChIP-Seq Level 2 Data (Controlled)
- CGCI HTMCP-CC ChIP-Seq Level 3 Data (Controlled)
- CGCI HTMCP-CC ChIP-Seq Metadata
- CGCI HTMCP-CC Methylation Array
- CGCI HTMCP-CC Transcript References
- Sample Matrix
Additional Resources
- Office of Cancer Genomics (link is external)
Instructions for Data Download
Open Access Data
- Download the appropriate manifest file from the publication page
- Use the manifest file to download data using the GDC Data Transfer Tool (DTT) or the GDC API
- GDC DTT ( Download, User's Guide)
- GDC API ( User’s Guide)
Controlled Access Data
- Download the appropriate manifest file from the publication page
- Download a token from the GDC Data Portal
- GDC Data Portal ( Launch, User’s Guide)
- Use the manifest file and token to download data using the GDC DTT or the GDC API
- GDC DTT ( Download, User’s Guide)
- GDC API ( User’s Guide)
For assistance, please contact the GDC Help Desk: support@nci-gdc.datacommons.io.