Main Content

Access Data

Why are there fewer open access TCGA mutations in DR 32 (GENCODE Update Release)?

Submitted by Anonymous on

The primary reasons for the fewer open-access mutations are from two strategies that improve quality: 1) TCGA is now using a 2-caller ensemble, instead of a single caller; 2) Removal of variants outside of the target capture region, instead of a combined “target capture + GAF exonic region”. Additionally, TCGA was the original project in which GDC open-access variants were produced and used variant rescue steps that only applied to TCGA. To keep the TCGA variant-calling pipeline consistent across projects, GDC is no longer rescuing MC3 and TCGA validation variants.

What data types were updated in DR 32 (GENCODE Update Release)?

Submitted by Anonymous on
    RNA-Seq

  • Replaced all RNA-Seq data including: Alignments, Gene Expression (STAR) + New Normalization, Transcript Fusion
  • Removed HTSeq Files
  • Re-harmonized TCGA data to use the newer pipeline
    WXS/Targeted Sequencing

  • Generated and versioned new annotated somatic mutations and Ensemble MAFs
  • Re-harmonized TCGA data to use the newer pipeline (alignments + mutation calls)
    WGS

  • Generated and versioned structural variant and gene level copy number data
    Methylation

Where can I find clinical data elements specific to my cancer research of interest?

Submitted by Anonymous on

The GDC supports the submission of clinical and biospecimen supplements. Supplemental files can be downloaded from the GDC by searching for the Data Type "Clinical Supplement" or "Biospecimen Supplement" from the facet search in the GDC Data Portal Repository. For TCGA data, the supplement data is provided in XML documents and tab delimited files (biotabs). These files, in varying degrees, provide information on marker status (e.g.

Why is the data maintained in cBioPortal, Broad Firehose, or the Seven Bridges Cancer Genomics Cloud different from the GDC data?

Submitted by Anonymous on

The GDC harmonizes data across projects. This includes aligning the genomic data to a common reference genome (HG38) and generating higher level data using GDC bioinformatics pipelines. Other repositories may process the data differently.

Subscribe to Access Data