The GDC categorizes whole genome sequencing (WGS) coverage using the wgs_coverage property, which groups BAM files into four range buckets: 0x-10x, 10x-25x, 25x-150x, and 150x+. "High coverage" WGS in the GDC generally refers to alignments with a mean depth of 25x or greater — meaning, on average, each base in the genome is covered by at least 25 sequencing reads. This threshold reflects the minimum depth at which variant calling pipelines (SNV, indel, and structural variant) achieve reliable sensitivity and specificity for somatic mutation detection in cancer genomics. Researchers requiring higher confidence — for example, detecting low-allele-fraction somatic variants in tumor samples — may prefer a higher coverage.
To find high coverage WGS data in the GDC Data Portal, navigate to the GDC Data Portal Repository page. In the left facet panel WGS Coverage filter, select the checkboxes for 150x+ and 25x-150x. The Repository displays high coverage WGS files associated with your cohort.
If your project requires a more precise measure of alignment coverage, we recommend using the mean coverage field, which can be found by choosing the "Add a Custom Filter" button in the Repository and searching for "mean_coverage". This field allows for specific ranges of coverage to be specified. Note that mean coverage is not specific to whole genome sequencing files.
High coverage WGS data can also be retrieved using the GDC Application Programming Interface (API) files endpoint and filtering by wgs_coverage, which is a property of GDC aligned reads for WGS data. Mean coverage can also be retrieved by filtering for mean_coverage in the files endpoint.
Filtering for higher or mean coverage WGS data is especially useful for The Cancer Genome Atlas (TCGA) program which maintains 20,000+ high coverage WGS data.