10.6 Annotation of DMRs/DMCs and segments

The regions of interest obtained through differential methylation or segmentation analysis often need to be integrated with genome annotation datasets. Without this type of integration, differential methylation or segmentation results will be hard to interpret in biological terms. The most common annotation task is to see where regions of interest land in relation to genes and gene parts and regulatory regions: Do they mostly occupy promoter, intronic or exonic regions? Do they overlap with repeats? Do they overlap with other epigenomic markers or long-range regulatory regions? These questions are not specific to methylation −nearly all regions of interest obtained via genome-wide studies have to deal with such questions. Thus, there are already multiple software tools that can produce such annotations. One is the Bioconductor package genomation(Akalin, Franke, Vlahoviček, et al. 2015). It can be used to annotate DMRs/DMCs and it can also be used to integrate methylation proportions over the genome with other quantitative information and produce meta-gene plots or heatmaps. Below, we are reading a BED file for transcripts and using that to annotate DMCs with promoter/intron/exon/intergenic annotation. The genomation::readTranscriptFeatures() function reads a BED12 file, calculates the coordinates of promoters, exons, and introns and the subsequent function uses that information for annotation.

library(genomation)

# read the gene BED file
transcriptBED=system.file("extdata", "refseq.hg18.bed.txt", 
                                           package = "methylKit")
gene.obj=readTranscriptFeatures(transcriptBED)
#
# annotate differentially methylated CpGs with 
# promoter/exon/intron using annotation data
#
annotateWithGeneParts(as(all.diff,"GRanges"),gene.obj)
##   promoter       exon     intron intergenic 
##      28.24      15.27      33.59      58.02 
##   promoter       exon     intron intergenic 
##      28.24       0.00      13.74      58.02 
## promoter     exon   intron 
##     0.29     0.03     0.17 
##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##       5     815   49918   52410   94644  313528

Similarly, we can read the CpG island annotation and annotate our differentially methylated bases/regions with them.

# read the shores and flanking regions and name the flanks as shores 
# and CpG islands as CpGi
cpg.file=system.file("extdata", "cpgi.hg18.bed.txt", 
                                        package = "methylKit")
cpg.obj=readFeatureFlank(cpg.file,
                           feature.flank.name=c("CpGi","shores"))
## Warning: 'GenomicRangesList' is deprecated.
## Use 'GRangesList(..., compress=FALSE)' instead.
## See help("Deprecated")
#
# convert methylDiff object to GRanges and annotate
diffCpGann=annotateWithFeatureFlank(as(all.diff,"GRanges"),
                                    cpg.obj$CpGi,cpg.obj$shores,
                         feature.name="CpGi",flank.name="shores")

Besides these, DMRs/DMCs might be associated with changes in gene regulation. It might be desirable to overlap them with known transcription binding sites or motifs or histone modifications. These are simply overlap operations for these kinds of analysis. You can use the genomation::annotateWithFeature() function or any other approach shown in Chapter 6, and you can also do motif discovery with methods shown in Chapter 9.

10.6.1 Further annotation with genes or gene sets

The next obvious steps for annotating your DMRs/DMCs are figuring out which genes they are associated with. Figuring out which genes are associated with your regions of interest can give a better idea of the biological implications of the methylation changes. Once you have your gene set, you can do gene set analysis as shown in Chapter 8 or in Chapter 11. There are also packages such as rGREAT that can simultaneously associate DMRs or any other region of interest to genes and do gene set analysis.

References

Akalin, Franke, Vlahoviček, Mason, and Schübeler. 2015. “Genomation: A Toolkit to Summarize, Annotate and Visualize Genomic Intervals.” Bioinformatics 31 (7): 1127–9.