10.8 Exercises

10.8.1 Differential methylation

The main objective of this exercise is getting differential methylated cytosines between two groups of samples: IDH-mut (AML patients with IDH mutations) vs. NBM (normal bone marrow samples).

Download methylation call files from GEO. These files are readable by methlKit using default methRead arguments. [Difficulty: Beginner]

samples	Link
IDH1_rep1	link
IDH1_rep2	link
NBM_rep1	link
NBM_rep2	link

Example code for reading a file:

library(methylKit)
m=methRead("~/Downloads/GSM919982_NBM_1_myCpG.txt.gz",
           sample.id = "idh",assembly="hg18")

Find differentially methylated cytosines. Use chr1 and chr2 only if you need to save time. You can subset it after you download the files either in R or Unix. The files are for hg18 assembly of human genome. [Difficulty: Beginner]
Describe the general differential methylation trend, what is the main effect for most CpGs? [Difficulty: Intermediate]
Annotate differentially methylated cytosines (DMCs) as promoter/intron/exon? [Difficulty: Beginner]
Which genes are the nearest to DMCs? [Difficulty: Intermediate]
Can you do gene set analysis either in R or via web-based tools? [Difficulty: Advanced]

10.8.2 Methylome segmentation

The main objective of this exercise is to learn how to do methylome segmentation and the downstream analysis for annotation and data integration.

Download the human embryonic stem-cell (H1 Cell Line) methylation bigWig files from the Roadmap Epigenomics website. It may take a while to understand how the website is structured and which bigWig file to use. That is part of the exercise. The files you will download are for hg19 assembly unless stated otherwise. [Difficulty: Beginner]
Do segmentation on hESC methylome. You can only use chr1 if using the whole genome takes too much time. [Difficulty: Intermediate]
Annotate segments and the kinds of gene-based features each segment class overlaps with (promoter/exon/intron). [Difficulty: Beginner]
For each segment type, annotate the segments with chromHMM annotations from the Roadmap Epigenome database available here. The specific file you should use is here. This is a bed file with chromHMM annotations. chromHMM annotations are parts of the genome identified by a hidden-Markov-model-based machine learning algorithm. The segments correspond to active promoters, enhancers, active transcription, insulators, etc. The chromHMM model uses histone modification ChIP-seq and potentially other ChIP-seq data sets to annotate the genome.[Difficulty: Advanced]