The Genomic Data Commons (GDC) is a research program of the National Cancer Institute (NCI). The mission of the GDC is to provide the cancer research community with a unified repository and cancer knowledge base that enables data sharing across cancer genomic studies in support of precision medicine.
The National Cancer Institute, part of the National Institutes of Health (NIH), is the federal government's principal agency for cancer research and training. NCI’s mission is to lead, conduct, and support cancer research across the nation to advance scientific knowledge and help all people to live longer, healthier lives. NCI’s scope of work spans a broad spectrum of cancer research across a variety of disciplines and supports research training opportunities at career stages across the academic continuum.
The GDC is absolutely fantastic.
GDC Promoting Precision Medicine in Oncology
About the GDC: Promoting Precision Medicine in Oncology
June 1, 2016
The National Cancer Institute’s (NCI’s) Genomic Data Commons (GDC) is a data sharing platform that promotes precision medicine in oncology. It is not just a database or a tool; it is an expandable knowledge network supporting the import and standardization of genomic and clinical data from cancer research programs.
The GDC contains NCI-generated data from some of the largest and most comprehensive cancer genomic datasets, including The Cancer Genome Atlas (TCGA) and Therapeutically Applicable Research to Generate Effective Therapies (TARGET). For the first time, these datasets have been processed using a common set of bioinformatics pipelines, so that the data can be directly compared.
As a growing knowledge system for cancer, the GDC also enables researchers to submit data, and the GDC processes these data using bioinformatics pipelines for aligning the data to a common reference genome and generating higher level data such as variant calls and expression quantifications. As more researchers add clinical and genomic data to the GDC, it will become an even more powerful tool for making discoveries about the molecular basis of cancer that may lead to better care for patients.
- Obtain an overview of the GDC and explore GDC 2.0
- Discover the evolution of the GDC in the Genomic Data Commons Podcast in the NCI Personal Genomics Podcast Series
The GDC provides the Research Community with the Following Benefits
- Access to high-quality standardized biospecimen, clinical, and molecular data
- Web-based tools supporting fine-grained queries, advanced visualization, smart search technologies, and personalized download facilities
- Bioinformatics pipelines supporting DNA and RNA sequence alignment against a common reference genome
- Programmatic interfaces supporting data retrieval, download, and submission by third party applications
- Resources supporting the high performance retrieval, download, and submission of GDC data
- Data submission tools for validating and submitting data into GDC
- Data generation pipelines supporting the high level data generation of DNA sequence variants, mutation analyses, SNP chip genotypes, and expression analyses
- Interfaces to eRA Commons and dbGaP for secure access to controlled data sets
|
|
How often does the GDC update the workflow/reference genome? If the GDC updates the workflow/reference genome, does the GDC re-process all data sets?
For the reference genome, the GDC has been using an augmented version of GRCh38.p2 (with additional decoy sequences and virus sequences) since inception. The GDC does not use alternative contigs, and only derives high-level data from the major chromosomes, so the same reference genome is used for both gene model GENCODE v22 (from Data Release 1 to 31) and GENCODE v36 (from Data Release 32). As future versions of the reference genome are released, e.g., GRCh39, the GDC will evaluate the benefits of updating data to utilize the new version. By updating the reference genome, the GDC would expect to re-process all data sets. For information on the reference genome used by the GDC, please refer to the GDC Reference Files.
For workflow updates, the GDC prefers to keep the workflow stable, and will not update unless there are necessary updates such as updates of the reference genome or gene model, or major algorithm updates in the tools that could result significant changes in the generated data. When workflow updates are actually needed, the GDC categorizes them as either major updates or minor updates depending on whether the update significantly affects the output data. The GDC will re-process all existing data sets in major workflow updates, and such examples include transitioning the RNA-Seq genomic BAM alignment workflow into a new version that generates three BAMs and STAR counts; and updating the MAF workflow to add additional functions to the MAF files. Minor updates mostly happen to resolve bugs, security issues, and/or compatibility issues. For example, the GDC DNA-Seq alignment workflow has been updated several times to address quality issues from various submitted data; however, because the main alignment algorithm remains almost the same, the GDC does not need to re-process all the data sets for these minor updates.
The latest news about the Genomic Data Commons (GDC):