## 7.4 Mapping/aligning reads to the genome

Another class of algorithms builds genome indices by creating a Burrows-Wheeler transformation of the genome. This in essence creates a compact and searchable data structure for all reads. Although details are out of the scope of this section, these alignment tools provide faster alignment and use less memory. BWA(H. Li and Durbin 2009a), Bowtie1/2(Langmead and Salzberg 2012a) and SOAP(R. Li, Yu, Li, et al. 2009) are examples of such algorithms.

The read mapping in R can be done with the gmapR (Barr, Wu, and Lawrence 2019), QuasR (Gaidatzis, Lerch, Hahne, et al. 2015), Rsubread (Liao, Smyth, and Shi 2013), and systemPipeR (Backman and Girke 2016) packages. We will demonstrate read mapping with QuasR which uses the Rbowtie package, which wraps the Bowtie aligner. Below, we show how to map reads from a ChIP-seq experiment using QuasR/bowtie.

We will use the qAlign() function which requires two mandatory arguments: 1) a genome file in either fasta format or as a BSgenome package and 2) a sample file which is a text file and contains file paths to fastq files and sample names. In the case below, sample file looks like this:

FileName    SampleName
chip_1_1.fq.bz2 Sample1
chip_2_1.fq.bz2 Sample2
library(QuasR)

# copy example data to current working directory
file.copy(system.file(package="QuasR", "extdata"), ".", recursive=TRUE)

# genome file in fasta format
genomeFile <- "extdata/hg19sub.fa"

# text file containing sample names and fastq file paths
sampleFile <- "extdata/samples_chip_single.txt"

# create alignments
proj <- qAlign(sampleFile, genomeFile)

It is good to explain what is going on here as the qAlign() function makes things look simple. This function is designed to be easy. For example, it creates a genome index automatically if it does not exist, and will look for existing indices before it creates one. We provided only two arguments, a text file containing sample names and fastq file paths and a reference genome file. In fact, this function also has many knobs and you can change its behavior by supplying different arguments in order to affect the behavior of Bowtie. For example, you can supply parameters to Bowtie using the alignmentParameter argument. However the qAlign() function is optimized for different types of alignment problems and selects alignment parameters automatically. It is designed to work with alignment and quantification tasks for RNA-seq, ChIP-seq, small-RNA sequencing, Bisulfite sequencing (DNA methylation) and allele-specific analysis. If you want to change the default bowtie parameters, only do it for simple alignment problems such as ChIP-seq and RNA-seq.

Want to know more ?

### References

Backman, and Girke. 2016. “systemPipeR: NGS Workflow and Report Generation Environment.” BMC Bioinformatics 17 (1). https://doi.org/10.1186/s12859-016-1241-0.

Barr, Wu, and Lawrence. 2019. GmapR: An R Interface to the Gmap/Gsnap/Gstruct Suite.

Gaidatzis, Lerch, Hahne, and Stadler. 2015. “QuasR: Quantification and Annotation of Short Reads in R.” Bioinformatics 31 (7): 1130–2. https://doi.org/10.1093/bioinformatics/btu781.

Langmead, and Salzberg. 2012a. “Fast Gapped-Read Alignment with Bowtie 2.” Nature Methods 9 (4): 357.

Liao, Smyth, and Shi. 2013. “The Subread Aligner: Fast, Accurate and Scalable Read Mapping by Seed-and-Vote.” Nucleic Acids Research 41 (10): e108–e108. https://doi.org/10.1093/nar/gkt214.

Li, and Durbin. 2009a. “Fast and Accurate Short Read Alignment with Burrows–Wheeler Transform.” Bioinformatics 25 (14): 1754–60.

Li, Yu, Li, Lam, Yiu, Kristiansen, and Wang. 2009. “SOAP2: An Improved Ultrafast Tool for Short Read Alignment.” Bioinformatics 25 (15): 1966–7.