Main Content

Submit Data

How many bytes are there in a megabyte or gigabyte?

Submitted by Anonymous on

There has been long standing debate about prefixes for multiples of bytes. We have chosen to utilize the standard supported by the International System of Units (SI) where 1 gigabyte (GB) = 109 bytes or 1 megabyte (MB) = 106 bytes. This convention is also supported by the IEEE, EU, NIST, and the International System of Quantities. Where appropriate, we utilize the IEEE 1541 recommendations for binary representation where 10243 bytes = 1 gibibyte (GiB) or 10242 bytes = 1 mebibyte (MiB).

What is the process for uploading, submitting, and releasing data in the GDC?

Submitted by Anonymous on

Uploaded and validated data is put in a workspace until the user formally submits the data to the GDC. This allows users to interact with the data before submitting. Once the data is submitted, the GDC will process applicable datasets (e.g. harmonize molecular data and generate high level data). After processing has been completed, the data is made publicly available according to GDC Data Sharing Policies.

How is validation performed on genomic data (BAM files) submitted to the GDC?

Submitted by Anonymous on

Submitted BAM files are validated at the GDC for file integrity and format using md5sum checks, automated QC checks, and the Picard ValidateSamFiles tool. Sequencing quality is assessed using FASTQC, and additional quality metrics are gathered using tools like Picard and Samtools. Severe issues, such as high cross-sample contamination, may prevent the data from being released, but minor issues typically do not result in rejection. Instead, the GDC exposes many of the quality metrics so users may review them and do further filtering.

Does the GDC Data Transfer Tool use random or sequential read/write? Does the choice of protocol make a difference?

Submitted by Anonymous on

The GDC Data Transfer Tool uses sequential read/write for each file segment that is being transferred. By default, the tool executes multipart transfers, which results in multiple parallel, sequential read or write operations. To turn off multipart transfers, users can set the number of processes to 1.

Subscribe to Submit Data