9.8 What to do next?

One of the first next steps after you have your peaks is to find out what kind of genes they might be associated with. This is very similar to the gene set analysis we introduced for RNA-seq in Chapter 8. The same tools, such as gProfileR package, can be used on the genes associated with the peaks. However, associating peaks to genes is not always trivial due to long-range gene regulation. Many enhancers can regulate genes that are far away and their targets are not always the nearest gene. However, associating peaks to nearest genes is a generally practiced strategy in ChIP-seq analysis. We have introduced how to find the nearest genes in Chapter 6. There are also other R packages that will do the association to genes and the gene set analysis in a single workflow. One such package is rGREAT from Bioconductor. This package relies on a web-based tool called GREAT.

Knowing every location in the genome bound by a protein can provide a lot of mechanistic information. However, quite often it is hard to make biologically relevant conclusions just from one ChIP-seq experiment (i.e. if we want to explain how our protein causes a disease, it is hard to guess which of the tens of thousands of binding places is relevant for the phenotype). Therefore, it is customary to integrate the results with data which is already available for our system of interest - ChIP-seq of different proteins, genome wide measurements of expression, or assays of 3D genome structure.

The choice of downstream analysis is guided by the biological question of interest. Often we want to compare our samples to other available ChIP-seq experiments. It is possible to look at the pairwise differences between samples using differential peak calling (Zhang, Lin, Johnson, et al. 2014; Lun and Smyth 2014; Allhoff, Seré, Chauvistré, et al. 2014; Allhoff, Seré, F Pires, et al. 2016). It is a procedure analogous to the differential expression analysis, except it results in sets of coordinates that are differentially bound in two biological conditions. We can then search for a specific DNA binding motif in such regions, or correlate changes in the binding with changes in gene expression. With an increase in the number of ChIP experiments, pairwise comparisons become combinatorially complex. In this case we can segment the genome into multiple classes, where each class corresponds to a combination of bound transcription factors. Genome segmentation is usually done using probabilistic models (such as hidden Markov models (Ernst and Kellis 2012; Hoffman, Buske, Wang, et al. 2012)), or machine learning algorithms (Mortazavi, Pepke, Jansen, et al. 2013).

References

Allhoff, Seré, Chauvistré, Lin, Zenke, and Costa. 2014. “Detecting Differential Peaks in ChIP-Seq Signals with ODIN.” Bioinformatics 30 (24): 3467–75. https://doi.org/10.1093/bioinformatics/btu722.

Allhoff, Seré, F Pires, Zenke, and G Costa. 2016. “Differential Peak Calling of ChIP-Seq Signals with Replicates with THOR.” Nucleic Acids Res 44 (20): e153. https://doi.org/10.1093/nar/gkw680.

Ernst, and Kellis. 2012. “ChromHMM: Automating Chromatin-State Discovery and Characterization.” Nat Methods 9 (3): 215–16. https://doi.org/10.1038/nmeth.1906.

Hoffman, Buske, Wang, Weng, Bilmes, and Noble. 2012. “Unsupervised Pattern Discovery in Human Chromatin Structure Through Genomic Segmentation.” Nat Methods 9 (5): 473–76. https://doi.org/10.1038/nmeth.1937.

Lun, and Smyth. 2014. “De Novo Detection of Differentially Bound Regions for ChIP-Seq Data Using Peaks and Windows: Controlling Error Rates Correctly.” Nucleic Acids Res 42 (11): e95. https://doi.org/10.1093/nar/gku351.

Mortazavi, Pepke, Jansen, Marinov, Ernst, Kellis, Hardison, Myers, and Wold. 2013. “Integrating and Mining the Chromatin Landscape of Cell-Type Specificity Using Self-Organizing Maps.” Genome Res 23 (12): 2136–48. https://doi.org/10.1101/gr.158261.113.

Zhang, Lin, Johnson, Rozek, and Sartor. 2014. “PePr: A Peak-Calling Prioritization Pipeline to Identify Consistent or Differential Peaks from Replicated ChIP-Seq Data.” Bioinformatics 30 (18): 2568–75. https://doi.org/10.1093/bioinformatics/btu372.