Structure of the book

The book is designed with the idea that practical and conceptual understanding of data analysis methods is as important, if not more important, than the theoretical understanding, such as detailed derivation of equations in statistics or machine learning. That is why we first try to give a conceptual explanation of the concepts then we try to give essential parts of the mathematical formulas for more detailed understanding. In this spirit, we always show and explain the code for a particular data analysis task. We also give additional references such as books, websites, video lectures and scientific papers for readers who desire to gain deeper theoretical understanding of data analysis-related methods or concepts.

Chapter 1: “Introduction to Genomics” introduces the basic concepts in genome biology and genomics. Understanding these concepts is important for computational genomics.

Chapter 2: “Introduction to R for Genomic Data Analysis” provides the basic R skills necessary to follow the book in addition to common data analysis paradigms we observe in genomic data analysis. Chapter 3: “Statistics for Genomics”, Chapter 4: “Exploratory Data Analysis with Unsupervised Machine Learning” and Chapter 5: “Predictive Modeling with Supervised Machine Learning” introduce the necessary quantitative skills that one will need when analyzing high-dimensional genomics data.

Chapter 6: “Operations on Genomic Intervals and Genome Arithmetic” introduces the fundamental tools for dealing with genomic intervals and their relationship to each other over the genome. In addition, the chapter introduces a variety of genomic data visualization methods. The skills introduced in this chapter are key skills that are needed to work with processed genomic data which are available through public databases such as Ensembl and the UCSC browser.

The next chapters deal with specific analysis of high-throughput sequencing data and integrating different kinds of datasets. Chapter 7: “Quality Check, Processing and Alignment of High-throughput Sequencing Reads” introduces quality checks that need to be done on sequencing reads and different ways to process them further. Chapters 8, 9 and 10 deal with RNA-seq analysis, ChIP-seq analysis and BS-seq analysis. The last chapter, Chapter 11:“Multi-omics Analysis” deals with methods for integrating multiple omics datasets.

Most chapters have exercises that reinforce some of the important points introduced in the chapters. The exercises are classified into beginner, intermediate and advanced categories. If you are well versed in a certain subject you might want to skip beginner-level exercises.

To sum it up, this book is a comprehensive guide for computational genomics. Some sections are there for the sake of the wide interdisciplinary audience and completeness, and not all sections will be equally useful to all readers of this broad audience.