Chapter 4 Exploratory Data Analysis with Unsupervised Machine Learning

In this chapter, we will focus on using some of the machine learning techniques to explore genomics data. The goals of data exploration are usually many. Generally, we want to understand how the variables in our data set relate to each other and how the samples defined by those variables relate to each other. These points of information can be used to generate a hypothesis, find outliers in the samples or identify sample groups that need more data points. In this chapter, we will focus on two main classes of techniques: “clustering” and “dimension reduction”. We will show how to use these techniques and how to visualize them using R. As these techniques are fundamental for data analysis, we will see more of their use cases in Chapters 8, 9, 10 and 11.