Main Content

GDC Community Tools

Explore the External Tools
FILTERS
10 External Tools

GDC MAF Aggregation Tool

Authors: GDC Development Team| Posted Date:

The GDC MAF tool aggregates aliquot-level MAFs, which originate from one tumor-normal pair. MAFs can aggregated on a project-level or by providing a set of files/cases. Note that currently the GDC MAF tool only supports Ensemble aliquot-level MAFs generated from whole exome sequencing. Ensemble aliquot-level MAFs include variants from all five variant callers (MuTect2, MuSE, Varscan2, Pindel) and include information about which caller each variant originated from. The GDC MAF tool will only aggregate MAFs from within one GDC project.

gdc-readgroups

Authors: Jeremiah Savage - GDC Bioinformatics Team| Posted Date:

The gdc-readgroups tool is a great starting point for any group that plans on submitting molecular data in BAM format to the Genomic Data Commons. This tool automatically retrieves required and useful read group metadata from a BAM file header and generates a format submittable to the GDC.

Improved DNA Methylation Array Probe Annotation

Authors: Wanding Zhou, Ben Berman, Peter W. Laird, Hui Shen| Posted Date:

This collection of annotations include hg38-based genomic coordinates for Illumina Infinium HumanMethylation27, HumanMethylation450, MethylationEPIC arrays, and masks for probes with low quality (MASK_general column). It also contains detailed information on overlapping gene (including distance of TSS), promoter and CpG island. In addition, functional annotations including overlap with ENCODE and ROADMAP ChromHMM chromatin states and Transcription Factor Binding Sites (TFBS) are also available.

SeSAMe (SEnsible Step-wise Analysis of Methylation data)

Authors: Wanding Zhou, Timothy Triche Jr., Peter W. Laird, Hui Shen| Posted Date:

Tool for analyzing Infinium DNA methylation array data.

The SeSAME Tool:

  • Reduction to artifactual detection from Infinium of DNA methylation microarrays
  • Low-level processing of Illumina Infinium DNA methylation array
  • Quality control of DNA methylation arrays
  • Biological inference (sex, age, karyotypes, copy number, etc.)

BISCUIT (BISulfite-seq CUI Toolkit)

Authors: Wanding Zhou, Jacob Morrison, Timothy Triche Jr. , Peter W. Laird , Hui Shen| Posted Date:

Tool suite for analyzing high throughput bisulfite sequencing data.

BISCUIT performs:

  • Alignment and quality control of bisulfite sequencing reads
  • Extraction of DNA methylation level
  • Extraction of genetic information from bisullfite-sequencing data 
  • Analysis of allele-specific methylation and methylation haplotype

GDC RNASeq Tool

Authors: Colin Reid, GDC User Services| Posted Date:

The GDC RNASeq Tool downloads / merges individual RNASeq files from the GDC Data Portal into a matrices identified by TCGA barcode.

The GDC RNASeq Tool:

  • Downloads RNA-Seq / miRNA-Seq data files using a GDC manifest file
  • Unzips the files into separate folders identified by experimental strategy and bioinformatics workflow
  • Merges the files into separate matrix files

GenomicDataCommons R-Package

Authors: | Posted Date:

The National Cancer Institute (NCI) Genomic Data Commons provides the cancer research community with an open and unified repository for sharing and accessing data across numerous cancer studies and projects via a high-performance data transfer and query infrastructure. The Bioconductor project is an open source and open development software project built on the R statistical programming environment. A major goal of the Bioconductor project is to facilitate the use, analysis, and comprehension of genomic data. The GenomicDataCommons Bioconductor package provides basic infrastructure for querying, accessing, and mining genomic datasets available from the GDC. We expect that Bioconductor developer and bioinformatics community will build on the GenomicDataCommons package to add higher-level functionality and expose cancer genomics data to many state-of-the-art bioinformatics methods available in Bioconductor.

TCGABiolinks

Authors: Tiago Chedraoui Silva, Antonio Colaprico, Catharina Olsen, Michele Ceccarelli, Gianluca Bontempi, Houtan Noushmehr| Posted Date:

TCGAbiolinks was developed as an R/Bioconductor to address challenges with data mining and analysis of cancer genomics data stored at GDC. We offer bioinformatics solutions by using a guided workflow to allow users to query, download, and perform integrative analyses of GDC data. We combined methods from computer science and statistics into the pipeline and incorporated methodologies developed in previous TCGA marker studies. We also provide a graphics user interface (GUI) version of TCGAbiolinks that can run on a user's local machine. TCGAbiolinksGUI contains all the features of the R-version yet allows users an easier way to navigate the analysis steps. We provide online documentations, tutorials, and video guides to assist users with the analysis.

GDC TSV Downloader

Authors: Bill Wysocki, GDC User Services| Posted Date:

The GDC TSV downloader allows the user to use a Manifest from the GDC Data Portal to download clinical and biospecimen metadata for a set of files in a tab-delimited format.

GDCtools

Authors: Broad Institute Genome Data Analysis Center| Posted Date:

GDCtools is a set of open-source, config-file driven Python and UNIX CLI utilities for interacting with the NCI Genomics Data Commons and automating data cleansing, aggregation and reporting steps that are common to most data-driven science projects. It grew from efforts at the Broad Institute to connect the GDAC Firehose pipeline developed in TCGA to use the GDC as its primary source of data, but aims to go well beyond that. By wrapping the GDC API in a set of rigorously defined and domain-aware tools, GDCtools lets users interact with the GDC in memes familiar to them—as biomedical researchers and informaticians—rather than as web or database programmers. This can make it simpler to search and retrieve harmonized data & metadata from the GDC, and shrink the learning and staffing curves, while providing indispensable features such as:

  • Turnkey creation of date-stamped snapshots of data
  • Aggregating multiple samples into a single bolus for ready consumption by scientific algorithms
  • Ensuring that samples are identifiable by project (e.g. restoring TCGA ids to SNP6 segments)
  • Sample report and sample freeze list (load file) creation, for either on-premise or cloud storage (e.g. Google)
  • Aggregate cohort construction (e.g. combining TCGA STAD + ESCA cohorts into STES, with just 1 line in a config file)
  • Retrieving an entire project or just 1 case, with equal ease
  • Easily combining data across multiple projects (e.g. TCGA and CPTAC)

This is all available within a well-tested object-oriented framework that is easy to comprehend and extend by users. GDCtools is online at https://github.com/broadinstitute/gdctools, and includes documentation, examples and a pictorial overview.