Capture kit information is provided by the GDC API at the read group level, where available. In some cases, additional information may be available in SRA XML files.
The relevant read_group
properties returned by the GDC API are:
target_capture_kit_name
target_capture_kit_catalog_number
target_capture_kit_vendor
target_capture_kit_target_region
The target_capture_kit_target_region
field provides a URL for the capture kit target file, distributed by the kit manufacturer or by the research program. Bait/probe files can sometimes be found at the same URL; or a URL to the bait/probe file may be available in the SRA XML file.
Note: Some BAM files include information from multiple read groups, and sometimes read groups produced with different capture kits are included in the same BAM file. Tools are available for splitting BAM files into read groups, e.g. bamutil.
Note: Target and bait/probe files may use an older reference genome, so liftover may be required for certain applications.