Main Content

Data Access

GDC MAF Aggregation Tool

The GDC MAF tool aggregates aliquot-level MAFs, which originate from one tumor-normal pair. MAFs can aggregated on a project-level or by providing a set of files/cases. Note that currently the GDC MAF tool only supports Ensemble aliquot-level MAFs generated from whole exome sequencing. Ensemble aliquot-level MAFs include variants from all five variant callers (MuTect2, MuSE, Varscan2, Pindel) and include information about which caller each variant originated from. The GDC MAF tool will only aggregate MAFs from within one GDC project.

GDCtools

GDCtools is a set of open-source, config-file driven Python and UNIX CLI utilities for interacting with the NCI Genomics Data Commons and automating data cleansing, aggregation and reporting steps that are common to most data-driven science projects. It grew from efforts at the Broad Institute to connect the GDAC Firehose pipeline developed in TCGA to use the GDC as its primary source of data, but aims to go well beyond that.

TCGABiolinks

TCGAbiolinks was developed as an R/Bioconductor to address challenges with data mining and analysis of cancer genomics data stored at GDC. We offer bioinformatics solutions by using a guided workflow to allow users to query, download, and perform integrative analyses of GDC data. We combined methods from computer science and statistics into the pipeline and incorporated methodologies developed in previous TCGA marker studies. We also provide a graphics user interface (GUI) version of TCGAbiolinks that can run on a user's local machine.

GenomicDataCommons R-Package

The National Cancer Institute (NCI) Genomic Data Commons provides the cancer research community with an open and unified repository for sharing and accessing data across numerous cancer studies and projects via a high-performance data transfer and query infrastructure. The Bioconductor project is an open source and open development software project built on the R statistical programming environment. A major goal of the Bioconductor project is to facilitate the use, analysis, and comprehension of genomic data.

GDC RNASeq Tool

The GDC RNASeq Tool downloads / merges individual RNASeq files from the GDC Data Portal into a matrices identified by TCGA barcode.

The GDC RNASeq Tool:

  • Downloads RNA-Seq / miRNA-Seq data files using a GDC manifest file
  • Unzips the files into separate folders identified by experimental strategy and bioinformatics workflow
  • Merges the files into separate matrix files
Subscribe to Data Access