GTEx is a biobank funded by NIH to study how genetic variation affect RNA expression levels in normal tissue samples.
The GTEx samples are named as GTEX-XXXXX-YYYY-SM-ZZZZZ. XXXXX is the patient ID, YYYY is the code for the tissue, and ZZZZZ is the sample aliquot used for sequencing.
Enrollment concluded in 2015 and sample collection is still ongoing. The goal to have ~960 donors at the end of project
TARGET is the data source for childhood cancers, including Neuroblastoma, Acute Myeloid Leukemia, Wilms Tumor, Rhabdoid kidney tumor, and Clear Cell Sarcoma of the kidney
TCGA is a collaboration between National Cancer Institute(NCI) and National Human genome research Institute (NHGRI) involving 20 institutions from the US and Canada to
collect adjacent and cancerous tissues.
Sample IDs are headed with “TCGA”
Sample collection ended in 2013 after obtaining 20,000 tissues of 33 types of cancer from over 11,000 patients.
We filtered for data with complete phenotype metadata e.g. tissue origin or cancer type.
Dan R. Robinson et al. Integrative Clinical Genomics of Metastatic Cancer. Nature. 2017. 548:297-303.
Sample IDs headed by “SRR”
Metastatic cancer samples from 500 adult patients
This data contains 101 unique cancers. The top 2 metastatic cancer were prostate adenocarcinoma and breast invasive carcinoma.
Counts were computed using UC Santa Cruz TOIL.
The data had both polyA and Hybrid RNASeq runs. In order to ensure consistent comparison we only retained the polyA counts.
St. Jude’s Hospital n= 66 (29 F, 37 M)
This set of data of data are High Grade Glioma samples containing 22 samples of Diffuse Intrinsic Pontine Glioma and 44 samples of Non-Brainstem High Grade Glioma