Main Content

Access Data

Can GENCODE v22 data still be downloaded from the GDC Data Portal?

Submitted by Anonymous on Wed, 03/23/2022 - 16:21

Although GENCODE v22 data cannot be browsed in the GDC Data Portal, it can still be downloaded using the GDC Data Transfer Tool or API. You will need to either have a previous manifest or use known UUIDs to download v22 files.

Read more about Can GENCODE v22 data still be downloaded from the GDC Data Portal?

Why are there fewer open access TCGA mutations in DR 32 (GENCODE Update Release)?

Submitted by Anonymous on Wed, 03/23/2022 - 15:13

The primary reasons for the fewer open-access mutations are from two strategies that improve quality: 1) TCGA is now using a 2-caller ensemble, instead of a single caller; 2) Removal of variants outside of the target capture region, instead of a combined “target capture + GAF exonic region”. Additionally, TCGA was the original project in which GDC open-access variants were produced and used variant rescue steps that only applied to TCGA. To keep the TCGA variant-calling pipeline consistent across projects, GDC is no longer rescuing MC3 and TCGA validation variants.

Read more about Why are there fewer open access TCGA mutations in DR 32 (GENCODE Update Release)?

What data types were updated in DR 32 (GENCODE Update Release)?

Submitted by Anonymous on Wed, 03/23/2022 - 15:09

Replaced all RNA-Seq data including: Alignments, Gene Expression (STAR) + New Normalization, Transcript Fusion
Removed HTSeq Files
Re-harmonized TCGA data to use the newer pipeline

Generated and versioned new annotated somatic mutations and Ensemble MAFs
Re-harmonized TCGA data to use the newer pipeline (alignments + mutation calls)

Generated and versioned structural variant and gene level copy number data

Read more about What data types were updated in DR 32 (GENCODE Update Release)?

GDC Gene Model Updated to GENCODE 36

The gene model used as a reference across GDC has been updated from GENCODE 22 to GENCODE 36. GENCODE gene sets are continuously updated to improve the coverage and accuracy.

Read more about GDC Gene Model Updated to GENCODE 36

Data from Landmark Chernobyl studies, New Methylation Pipeline, and More

Highlights from the GDC's recent Data Release 30 include data from major studies exploring the effects of Chernobyl radiation and methylation arrays processed via a new pipeline at the GDC.

Read more about Data from Landmark Chernobyl studies, New Methylation Pipeline, and More

Browse Data from Rare Cancers in the Data Portal

A recent data release offers new ways to access data from the Count Me In project.

Read more about Browse Data from Rare Cancers in the Data Portal

Where can I find clinical data elements specific to my cancer research of interest?

Submitted by Anonymous on Wed, 02/10/2021 - 09:38

The GDC supports the submission of clinical and biospecimen supplements. Supplemental files can be downloaded from the GDC by searching for the Data Type "Clinical Supplement" or "Biospecimen Supplement" from the facet search in the GDC Data Portal Repository. For TCGA data, the supplement data is provided in XML documents and tab delimited files (biotabs). These files, in varying degrees, provide information on marker status (e.g.

Read more about Where can I find clinical data elements specific to my cancer research of interest?

What is the difference between tissue "collection" and tissue "procurement" in TCGA data?

Submitted by Anonymous on Wed, 02/10/2021 - 09:36

TCGA “collection” represents the collection of the sample for TCGA, whereas “procurement” represents the removal of tissue from the patient.

Read more about What is the difference between tissue "collection" and tissue "procurement" in TCGA data?

Why is the data maintained in cBioPortal, Broad Firehose, or the Seven Bridges Cancer Genomics Cloud different from the GDC data?

Submitted by Anonymous on Wed, 02/10/2021 - 09:25

The GDC harmonizes data across projects. This includes aligning the genomic data to a common reference genome (HG38) and generating higher level data using GDC bioinformatics pipelines. Other repositories may process the data differently.

Read more about Why is the data maintained in cBioPortal, Broad Firehose, or the Seven Bridges Cancer Genomics Cloud different from the GDC data?

How do I access data from TCGA marker or other landmark cancer genomics papers?

Submitted by Anonymous on Wed, 02/10/2021 - 09:24

The TCGA marker and other landmark cancer genomics papers, as well as associated supplemental files, are available on the GDC Publication Pages. The Publication Pages provide access to publication information and supplementary files.

Read more about How do I access data from TCGA marker or other landmark cancer genomics papers?

Subscribe to Access Data