10.6.1 Exercise 1
The main objective of this exercise is getting differential methylated cytosines between two groups of samples: IDH-mut (AML patients with IDH mutations) vs NBM (normal bone marrow samples).
- Download methylation call files from GEO. These files are readable by methlKit using default
- Find differentially methylated cytosines. Use chr1 and chr2 only if you need to save time. You can subset it after you download the files either in R or unix. The files are for hg18 assembly of human genome.
- Describe the differential methylation trend, what is the main effect ?
- Annotate differentially methylated cytosines (DMCs) promoter/intron/exon ?
- Which genes are the nearest to DMCs ? Can you do gene set analysis either in R or via web-based tools ?
Example code for reading a file:
library(methylKit) m=methRead("~/Downloads/GSM919982_NBM_1_myCpG.txt.gz",sample.id = "idh",assembly="hg18")
10.6.2 Exercise 2
The main objective of this exercise is to learn how to do methylome segmentation and the downstream analysis for annotation and data integration.
- Download the human embryonic stem-cell (H1 Cell Line) methylation bigWig files from Roadmap Epigenomics website. It may take a while to understand how the website is structured and which bigWig file to use. That is part of the exercise. The files you will download are for hg19 assembly unless stated otherwise.
- Do segmentation on hESC methylome. you can only do chr1 if it takes too much time.
- Annotate segments, what kind of gene-based features each segment class overlaps with (promoter/exon/intron)
- For each segment type, annotate the segments with chromHMM annotations from Roadmap Epigenome database available (here)[https://egg2.wustl.edu/roadmap/web_portal/chr_state_learning.html#core_15state], the specific file you should use is (here)[https://egg2.wustl.edu/roadmap/data/byFileType/chromhmmSegmentations/ChmmModels/coreMarks/jointModel/final/E003_15_coreMarks_mnemonics.bed.gz]. This is a bed file with chromHMM annotations. chromHMM annotations are parts of the genome identified by a hidden-markov-model based machine-learning algorithm. The segments correspond to active promoters, enhancers, active transcription, insulators. etc. The chromHMM model uses histone modification ChIP-seq and potentially other ChIP-seq data sets to annotate the genome.