Main Content

Access Data

Where can I find the target and bait/probe files (BED files) that describe the capture kit used in an exome sequencing experiment?

Submitted by Anonymous on

Capture kit information is provided by the GDC API at the read group level, where available. In some cases, additional information may be available in SRA XML files.

The relevant read_group properties returned by the GDC API are:

How do I avoid timeouts and transfer interruptions when downloading large datasets from the GDC Data Portal?

Submitted by Anonymous on

The GDC Data Portal is a web-based application that is limited by browser and network constraints. If a system timeout occurs when downloading files, please use the GDC Data Transfer Tool or contact the GDC Help Desk.

I only see patients with ages of 90 years or less in the GDC. Why is this?

Submitted by Anonymous on

HIPAA guidelines require that patients with ages greater than 89 years be aggregated into a single age category. This is to limit the ability to positively identify these individuals. In practice this will impact the values reported in several fields. We have chosen to accurately display the age at diagnosis, but fields that give dates or time periods after this benchmark may be compressed. This may include such fields as "Days to last follow up", "Days to last known disease status", "Days to recurrence", "Days to death", and "Year of death".

How many bytes are there in a megabyte or gigabyte?

Submitted by Anonymous on

There has been long standing debate about prefixes for multiples of bytes. We have chosen to utilize the standard supported by the International System of Units (SI) where 1 gigabyte (GB) = 109 bytes or 1 megabyte (MB) = 106 bytes. This convention is also supported by the IEEE, EU, NIST, and the International System of Quantities. Where appropriate, we utilize the IEEE 1541 recommendations for binary representation where 10243 bytes = 1 gibibyte (GiB) or 10242 bytes = 1 mebibyte (MiB).

Subscribe to Access Data