AI in Cancer Genomics
Artificial Intelligence (AI), along with its subfields of Machine Learning and Deep Learning, is transforming the field of cancer genomics. These advanced technologies are pivotal in identifying mutation patterns, classifying disease and predicting progression, developing new treatments, and enhancing our understanding of cancer biology. AI is driving breakthroughs and accelerating discoveries through the efficient processing of vast genomic datasets, revolutionizing how we approach cancer research and treatment.
For information on the use of AI across the NCI, please refer to the AI and Cancer and AI in Cancer Research sites.
AI in the GDC
Large foundational datasets are critical for training and development of the next generation of AI applications. Notably, the GDC offers the cancer research community access to high-quality harmonized genomic, clinical, biospecimen data, and whole slide images for use in the development of AI models and algorithms. Below are example AI applications showcasing how GDC data is utilized by the research community.
Application | Objectives |
---|---|
Biomarker Discovery |
|
Cancer Diagnosis & Risk Prediction |
|
Content Generation |
|
Cancer Type Classification |
|
Drug Development & Target Discovery |
|
Feature Detection |
|
Image Segmentation & Quality Control |
|
Model Development |
|
Personalized Medicine |
|
Predictive Analysis |
|
Survival Analysis |
|
Genomic Data Analysis |
|
GDC Resources Supporting AI
The GDC offers a variety of resources to support the use of GDC data in AI applications:
- GDC Data Portal: Explore, analyze, and download data from the GDC for specific cancer cohorts
- GDC Application Programming Interface (API): Programmatically query, download, and analyze GDC data
- GDC Data Dictionary: Access detailed information on genomic, clinical, and biospecimen properties within GDC data
- GDC Data Transfer Tool (DTT): Efficiently download large data sets with this high-performance command-line tool
- Harmonized Data: Download genomic data that has been standardized using GDC workflows, including DNA and RNA sequence data that has been aligned against a common reference genome build and derived data such as mutation calls and structural variants. Obtain access to clinical and biospecimen data, and genomic metadata harmonized using common data elements from the GDC Data Dictionary.
- High Quality Data: Access data that has undergone rigorous quality control and validation checks, including genomic quality metrics and clinically validated datasets
The GDC is committed to supporting AI an cancer applications by providing well documented, accessible, usable, and high quality data. We encourage community feedback through GDC Support to better understand the needs for cancer genomic resources that facilitate AI development.