Legacy Archive TCGA Tag Descriptions

TCGA Tag Descriptions

Tags are used in the GDC Legacy Archive for marking subsets of TCGA files that cannot be differentiated in the GDC Data Model. Tagged file subsets can correspond to experiments, analyses, institutions, or technologies. The tag filter can be accessed in the GDC Data Portal by choosing the “Add a File Filter” option in the “Files” tab of the left panel and typing “tags” into the field. Each tag is described below.

Clinical Data

Note: The field names for all Biotab files can be found at the bottom of this page.


Tag TCGA Data Level Tag description
aliquot 2 Biotab files with tab-delimited information associated with aliquots.
analyte 2 Biotab files with tab-delimited information associated with analytes.
auxiliary 1 + 2 Files containing auxiliary information about cases. This includes BCR XMLs/TSVs with HPV information and HTML files with project-level descriptions about microsatellite instability protocols.
control 1 + 2 Biotab file with TCGA barcode details and aliquot information.
cqcf 2 Case quality control form information. Only available for TCGA-LUAD.
diagnostic_slides 2 Biotab files that associate BCR patient and sample information with FFPE UUIDs.
drug 2 Biotab files with tab-delimited information associated with drug treatments administered to cases
follow_up 2 Two types of biotab files with tab-delimited information associated with follow-up procedures and follow-up new tumor events.
image 1 Image files in SVS format. Synonymous with “File Format: SVS”
nte 2 Biotab files with tab-delimited information associated with new tumor events.
omf 1 + 2 Files with information associated with other malignancy forms. Available as case-level XML files or project-level Biotab files.
patient 2 Biotab files with tab-delimited information associated with patient information.
portion 2 Two types of biotab files with tab-delimited information associated with portions or portion shipping.
protocol 2 Biotab files with tab-delimited information associated with the experimental protocol.
radiation 2 Biotab files with tab-delimited information associated with radiation treatments administered to cases.
sample 2 Biotab files with tab-delimited information associated with samples.
slide 2 Biotab files with tab-delimited information associated with slide images.

Copy Number Variation


Tag TCGA Data Level Tag description
allcnv 3 Segmentation files containing both germline and somatic CNVs (opposite tag: nocnv). From Broad Institute CNV analyses
alleleSpecificCN 2 Allele-specific copy number array signal intensity data for TCGA-LAML. From Washington University St. Louis.
B_Allele_Freq 2 The proportion of total signal intensity that is produced by the B allele in a SNP array. Used for TCGA-OV, GBM, LUSC projects (TCGA Analysis of Allele Specific Copy Number).
BioSizing 1 Tab-delimited bioanalysis data for calculating fragment size and concentration.
byallele 2 Copy number array intensity values calculated in an allele-specific manner. From Broad Institute CNV analyses. See alleleSpecificCN for TCGA-LAML. TCGA level 2 data.
cgh 1 Comparative genome hybridization QC data. Available as JPGs (actual photographs of microarray intensities) and PDFs (microarray reports). Subset of tag: qc.
cnv 2 Includes copy number variation files measured using CGH array and WGS with Illumina.
Delta_B_Allele_Freq 2 The absolute value of the difference between the SNP array B allele frequency in a tumor and its paired normal sample. Used for TCGA-OV, GBM, LUSC projects (TCGA Analysis of Allele Specific Copy Number).
Genotypes 2 Information on SNP array calls in A/B format. Available for TCGA-OV, GBM, LUSC projects (TCGA Analysis of Allele Specific Copy Number).
hg18 3 Copy number segmentation files that use hg18 build probes
hg19 3 Copy number segmentation files that use hg19 build probes
ismpolish 2 Intensities that have been normalized first using quantile normalization and then by median-polishing. From Broad Institute CNV analyses. Synonymous with Data Type: Normalized copy numbers. TCGA level 2 data.
LOH 3 Results from the CBS analysis of the delta B allele frequency data for each tumor/ normal pair. Used for TCGA-OV, GBM, LUSC projects (TCGA Analysis of Allele Specific Copy Number).
lowess_normalized_smoothed 1 Log ratio microarray graph, locally weighted scatterplot smoothing. Available as PNG photos for projects TCGA-OV, GBM, LUSC.
nocnv 3 Segmentation file only containing somatic CNVs (opposite tag: allcnv). From Broad Institute CNV analyses.
Normal_LogR 2 Log2 of the total signal intensity of a normal sample to reference genome ratio. Used for TCGA-OV, GBM, LUSC projects (TCGA Analysis of Allele Specific Copy Number).
Paired_LogR 2 Log2 of the total signal intensity for a tumor to its paired normal sample ratio. Used for TCGA-OV, GBM, LUSC projects (TCGA Analysis of Allele Specific Copy Number).
pairedcn 2 Tumor-normal paired copy number variation array intensity files for TCGA-LAML. From Washington University St. Louis.
QA 1 Case-level QA metrics for microarray data. Used for TCGA-OV, GBM, LUSC projects.
qc 1 QC data for microarrays. Available as JPGs (actual photographs of microarray intensities) and PDFs (microarray reports) for CGH array data. Available as XML coordinate files and project-level CSVs that contain aliquot fluorescence conjugation levels.
raw 2 Copy number value estimations calculated by summing allele-specific values. From Broad Institute CNV analyses. TCGA level 2 data.
seg 3 Contains results for copy number segmentation. Used for TCGA-OV, GBM, LUSC projects (TCGA Analysis of Allele Specific Copy Number).
segmentation 2 Copy number variation segmentation files measured using Illumina WGS. Contrast with CNV measured with array data files.
segmented 3 Segmentation files for TCGA-LAML array data. From Washington University St. Louis. TCGA level 3 data.
segnormal 3 Contains segmented log2 ratio data from normal samples. Used for TCGA-OV, GBM, LUSC projects (TCGA Analysis of Allele Specific Copy Number).
sif 2 Simple interaction format files. Synonymous with filter "Data Format: SIF"
tangent 2 Copy number values calculated with tangent normalization using BirdSuite. Note that these files are used for GDC Data Portal harmonization. TCGA level 2 data.
Unpaired_LogR 2 Tumor sample compared to reference pool of normal samples. Used for TCGA-OV, GBM projects (TCGA Analysis of Allele Specific Copy Number).

 

Gene Expression Data


Tag TCGA Data Level Tag description
DGE 1 + 2 Differential gene expression data. This is expression data that was generated using DMAGE (Deep Multiplex Analysis of Gene Expression). Available for TCGA-OV project.
exon 3 Exon-level expression data. Includes exon quantification (raw counts and RPKM) and junction quantification (raw counts).
FIRMA 3 Subset of gene expression: finding isoforms using robust multichip analysis. Available for TCGA-OV, GBM, LUSC projects.
gene 3 Gene-level expression data. This includes data from both mRNA (raw count and RSEM) and miRNA (raw count and RPKM). Total RNA-Seq data is also included (raw count and RSEM) in which all types of RNA were quantified. All expression data is available as normalized or unnormalized. Some exon array data files are also included in this tag.
indel 2 VCF files that only include insertion-deletion mutations.
isoform 3 Gene expression isoform data. This includes data from both mRNA (raw count and RSEM) and miRNA (raw count and RPKM) expression. Total RNA-Seq data is also included (raw count and RSEM) in which all types of RNA were quantified. All expression data is available as normalized or unnormalized.
junction 3 Exon junction quantification data. This includes only data from RNA-Seq or total RNA-Seq. Expression levels are measured in raw counts.
normalized 3 Normalized expression levels measured in RSEM. Includes gene and isoform level data.
snv 2 Simple nucleotide variation files measured with DNA or RNA-Seq. These are available as VCFs or MAFs. This tag does not include VCF and MAF files from copy number variation or bisulfite-seq studies.
sv 2 Structural rearrangement files. Available in FASTA and VCF format. The VCF files contain structural variants identified from RNA-Seq data. FASTA files contain contig sequences that are referenced by the VCF file. Available for TCGA – OV, STAD, ESCA, LAML.
unnormalized 3 Raw expression levels measured in RSEM. Includes gene and isoform level data.
v1 3 RNA-SeqV1 expression data. Includes gene, exon and junction expression data. No miRNA-Seq data is included.
v2 3 RNA-SeqV2 expression data. Includes gene, exon, isoform, and junction expression data. No miRNA-Seq data is included.

Methylation Data


Tag TCGA Data Level Tag description
auxiliary 2 Methylation array auxiliary data in SDF format. These include project-level methylation array QC metrics.
bisulfite 2 + 3 DNA methylation VCF and BED files. Includes bisulfite sequence alignments (VCF) and methylation percentages (BED). Subset of tag: meth
meth 1 + 2 + 3 Raw methylation array data and methylation alignments (Bisulfite-Seq). Includes raw and normalized array data, QC metrics and methylation beta values. Methylation array data can be found under Data Category: Raw microarray data.
qc 2 QC data for microarrays. Available as JPGs (actual photographs of microarray intensities) and PDFs (microarray reports) for CGH array data. Available as XML coordinate files and project-level CSVs that contain aliquot fluorescence conjugation levels.

 

miRNA Data


Tag TCGA Data Level Tag description
gene 3 Gene-level expression data. This includes data from both mRNA (raw count and RSEM) and miRNA (raw count and RPKM). Total RNA-Seq data is also included (raw count and RSEM) in which all types of RNA were quantified. All expression data is available as normalized or unnormalized. Some exon array data files are also included in this tag.
hg19 3 Copy number segmentation files that use hg19 build probes
isoform 3 Gene expression isoform data. This includes data from both mRNA (raw count and RSEM) and miRNA (raw count and RPKM) expression. Total RNA-Seq data is also included (raw count and RSEM) in which all types of RNA were quantified. All expression data is available as normalized or unnormalized.
miRNA 1 + 2 + 3 miRNA gene expression data and all available raw miRNA microarray data. Includes expression levels measured at the gene and isoform level.

 

Other Data


Tag TCGA Data Level Tag description
cov 1 ABI sequence trace files. Only from project TCGA-GBM.
coverage 2 Wiggle track files. Synonymous with Data Format: WIG.
indel 2 VCF files that only include insertion-deletion mutations.
msi 1 Capillary sequencer files associated with microsatellite instability. Available in FSA format or as tab-delimited processed results.
snv 2 Simple nucleotide variation files measured with DNA or RNA-Seq. These are available as VCFs or MAFs. This tag does not include VCF and MAF files from copy number variation or bisulfite-seq studies.
somatic 2 This tag includes all MAF files, except for those generated with capillary sequencing
tr 1 Trace ID-to-sample relationship files in tab-delimited format. Available for TCGA-GBM, OV.

Clinical Tag Fields