GDC Webinar: Navigating the GDC - A Case Study
The Navigating the GDC - A Case Study webinar is the first webinar in a series of NCI GDC Webinar.
The Navigating the GDC - A Case Study webinar is the first webinar in a series of NCI GDC Webinar.
The file detail page and the metadata files accessible from that page (if available) can be used to determine the difference between files that share the same filename. For example, the files may be associated with different aliquots, or different patients.
Raw sequencing files submitted to the GDC are processed using GDC Genomic Data Alignment pipelines. The processed data are made available in the GDC Data Portal as BAM files containing aligned reads and unmapped reads (if available). No reads are hard-clipped, but reads that were flagged as "failed" during an Illumina sequencing run are discarded.
Harmonized BAM files from RNA-seq and DNA-seq experiments will contain both mapped and unmapped reads, if available. Unmapped reads are not distributed separately.
Capture kit information is provided by the GDC API at the read group level, where available. In some cases, additional information may be available in SRA XML files.
The relevant read_group
properties returned by the GDC API are:
The GDC Data Portal is a web-based application that is limited by browser and network constraints. If a system timeout occurs when downloading files, please use the GDC Data Transfer Tool or contact the GDC Help Desk.
The GDC provides access to both open and controlled datasets. To access controlled datasets, users must obtain appropriate authorization through dbGaP. See Obtaining Access to Controlled Data for instructions on applying for access through dbGaP.
The following web browsers are supported for use with the GDC Data Portal, Submission Portal, Website, and Documentation site.
HIPAA guidelines require that patients with ages greater than 89 years be aggregated into a single age category. This is to limit the ability to positively identify these individuals. In practice this will impact the values reported in several fields. We have chosen to accurately display the age at diagnosis, but fields that give dates or time periods after this benchmark may be compressed. This may include such fields as "Days to last follow up", "Days to last known disease status", "Days to recurrence", "Days to death", and "Year of death".
The GDC processes data through several harmonization pipelines. If the process of harmonization reveals issues in the underlying data or if an error occurred during harmonization, the harmonized data files (e.g. BAMs or VCFs) will not appear in GDC data access tools.