## 2.10 Exercises

### 2.10.1 Computations in R

Sum 2 and 3 using the

`+`

operator. [Difficulty:**Beginner**]Take the square root of 36, use

`sqrt()`

. [Difficulty:**Beginner**]Take the log10 of 1000, use function

`log10()`

. [Difficulty:**Beginner**]Take the log2 of 32, use function

`log2()`

. [Difficulty:**Beginner**]Assign the sum of 2,3 and 4 to variable x. [Difficulty:

**Beginner**]Find the absolute value of the expression

`5 - 145`

using the`abs()`

function. [Difficulty:**Beginner**]Calculate the square root of 625, divide it by 5, and assign it to variable

`x`

.Ex:`y= log10(1000)/5`

, the previous statement takes log10 of 1000, divides it by 5, and assigns the value to variable y. [Difficulty:**Beginner**]Multiply the value you get from previous exercise by 10000, assign it to variable x Ex:

`y=y*5`

, multiplies`y`

by 5 and assigns the value to`y`

.**KEY CONCEPT:**results of computations or arbitrary values can be stored in variables we can re-use those variables later on and over-write them with new values. [Difficulty:**Beginner**]

### 2.10.2 Data structures in R

Make a vector of 1,2,3,5 and 10 using

`c()`

, and assign it to the`vec`

variable. Ex:`vec1=c(1,3,4)`

makes a vector out of 1,3,4. [Difficulty:**Beginner**]Check the length of your vector with length(). Ex:

`length(vec1)`

should return 3. [Difficulty:**Beginner**]Make a vector of all numbers between 2 and 15. Ex:

`vec=1:6`

makes a vector of numbers between 1 and 6, and assigns it to the`vec`

variable. [Difficulty:**Beginner**]Make a vector of 4s repeated 10 times using the

`rep()`

function. Ex:`rep(x=2,times=5)`

makes a vector of 2s repeated 5 times. [Difficulty:**Beginner**]Make a logical vector with TRUE, FALSE values of length 4, use

`c()`

. Ex:`c(TRUE,FALSE)`

. [Difficulty:**Beginner**]Make a character vector of the gene names PAX6,ZIC2,OCT4 and SOX2. Ex:

`avec=c("a","b","c")`

makes a character vector of a,b and c. [Difficulty:**Beginner**]Subset the vector using

`[]`

notation, and get the 5th and 6th elements. Ex:`vec1[1]`

gets the first element.`vec1[c(1,3)]`

gets the 1st and 3rd elements. [Difficulty:**Beginner**]You can also subset any vector using a logical vector in

`[]`

. Run the following:

```
myvec=1:5
# the length of the logical vector
# should be equal to length(myvec)
myvec[c(TRUE,TRUE,FALSE,FALSE,FALSE)]
myvec[c(TRUE,FALSE,FALSE,FALSE,TRUE)]
```

[Difficulty: **Beginner**]

`==,>,<, >=, <=`

operators create logical vectors. See the results of the following operations:

[Difficulty: **Beginner**]

Use the

`>`

operator in`myvec[ ]`

to get elements larger than 2 in`myvec`

which is described above. [Difficulty:**Beginner**]Make a 5x3 matrix (5 rows, 3 columns) using

`matrix()`

. Ex:`matrix(1:6,nrow=3,ncol=2)`

makes a 3x2 matrix using numbers between 1 and 6. [Difficulty:**Beginner**]What happens when you use

`byrow = TRUE`

in your matrix() as an additional argument? Ex:`mat=matrix(1:6,nrow=3,ncol=2,byrow = TRUE)`

. [Difficulty:**Beginner**]Extract the first 3 columns and first 3 rows of your matrix using

`[]`

notation. [Difficulty:**Beginner**]Extract the last two rows of the matrix you created earlier. Ex:

`mat[2:3,]`

or`mat[c(2,3),]`

extracts the 2nd and 3rd rows. [Difficulty:**Beginner**]Extract the first two columns and run

`class()`

on the result. [Difficulty:**Beginner**]Extract the first column and run

`class()`

on the result, compare with the above exercise. [Difficulty:**Beginner**]Make a data frame with 3 columns and 5 rows. Make sure first column is a sequence of numbers 1:5, and second column is a character vector. Ex:

`df=data.frame(col1=1:3,col2=c("a","b","c"),col3=3:1) # 3x3 data frame`

. Remember you need to make a 3x5 data frame. [Difficulty:**Beginner**]Extract the first two columns and first two rows.

**HINT:**Use the same notation as matrices. [Difficulty:**Beginner**]Extract the last two rows of the data frame you made.

**HINT:**Same notation as matrices. [Difficulty:**Beginner**]Extract the last two columns using the column names of the data frame you made. [Difficulty:

**Beginner**]Extract the second column using the column names. You can use

`[]`

or`$`

as in lists; use both in two different answers. [Difficulty:**Beginner**]Extract rows where the 1st column is larger than 3.

**HINT:**You can get a logical vector using the`>`

operator , and logical vectors can be used in`[]`

when subsetting. [Difficulty:**Beginner**]Extract rows where the 1st column is larger than or equal to 3. [Difficulty:

**Beginner**]Convert a data frame to the matrix.

**HINT:**Use`as.matrix()`

. Observe what happens to numeric values in the data frame. [Difficulty:**Beginner**]Make a list using the

`list()`

function. Your list should have 4 elements; the one below has 2. Ex:`mylist= list(a=c(1,2,3),b=c("apple,"orange"))`

[Difficulty:**Beginner**]Select the 1st element of the list you made using

`$`

notation. Ex:`mylist$a`

selects first element named “a”. [Difficulty:**Beginner**]Select the 4th element of the list you made earlier using

`$`

notation. [Difficulty:**Beginner**]Select the 1st element of your list using

`[ ]`

notation. Ex:`mylist[1]`

selects the first element named “a”, and you get a list with one element.`mylist["a"]`

selects the first element named “a”, and you get a list with one element. [Difficulty:**Beginner**]Select the 4th element of your list using

`[ ]`

notation. [Difficulty:**Beginner**]Make a factor using factor(), with 5 elements. Ex:

`fa=factor(c("a","a","b"))`

. [Difficulty:**Beginner**]Convert a character vector to a factor using

`as.factor()`

. First, make a character vector using`c()`

then use`as.factor()`

. [Difficulty:**Intermediate**]Convert the factor you made above to a character using

`as.character()`

. [Difficulty:**Beginner**]

### 2.10.3 Reading in and writing data out in R

- Read CpG island (CpGi) data from the compGenomRData package
`CpGi.table.hg18.txt`

. This is a tab-separated file. Store it in a variable called`cpgi`

. Use

```
cpgFilePath=system.file("extdata",
"CpGi.table.hg18.txt",
package="compGenomRData")
```

to get the file path within the installed `compGenomRData`

package. [Difficulty: **Beginner**]

Use

`head()`

on CpGi to see the first few rows. [Difficulty:**Beginner**]Why doesn’t the following work? See

`sep`

argument at`help(read.table)`

. [Difficulty:**Beginner**]

```
cpgtFilePath=system.file("extdata",
"CpGi.table.hg18.txt",
package="compGenomRData")
cpgtFilePath
cpgiSepComma=read.table(cpgtFilePath,header=TRUE,sep=",")
head(cpgiSepComma)
```

- What happens when you set
`stringsAsFactors=FALSE`

in`read.table()`

? [Difficulty:**Beginner**]

```
cpgiHF=read.table("intro2R_data/data/CpGi.table.hg18.txt",
header=FALSE,sep="\t",
stringsAsFactors=FALSE)
```

Read only the first 10 rows of the CpGi table. [Difficulty:

**Beginner/Intermediate**]Use

`cpgFilePath=system.file("extdata","CpGi.table.hg18.txt",`

`package="compGenomRData")`

to get the file path, then use`read.table()`

with argument`header=FALSE`

. Use`head()`

to see the results. [Difficulty:**Beginner**]Write CpG islands to a text file called “my.cpgi.file.txt”. Write the file to your home folder; you can use

`file="~/my.cpgi.file.txt"`

in linux.`~/`

denotes home folder.[Difficulty:**Beginner**]Same as above but this time make sure to use the

`quote=FALSE`

,`sep="\t"`

and`row.names=FALSE`

arguments. Save the file to “my.cpgi.file2.txt” and compare it with “my.cpgi.file.txt”. [Difficulty:**Beginner**]Write out the first 10 rows of the

`cpgi`

data frame.**HINT:**Use subsetting for data frames we learned before. [Difficulty:**Beginner**]Write the first 3 columns of the

`cpgi`

data frame. [Difficulty:**Beginner**]Write CpG islands only on chr1.

**HINT:**Use subsetting with`[]`

, feed a logical vector using`==`

operator.[Difficulty:**Beginner/Intermediate**]Read two other data sets “rn4.refseq.bed” and “rn4.refseq2name.txt” with

`header=FALSE`

, and assign them to df1 and df2 respectively. They are again included in the compGenomRData package, and you can use the`system.file()`

function to get the file paths. [Difficulty:**Beginner**]Use

`head()`

to see what is inside the data frames above. [Difficulty:**Beginner**]Merge data sets using

`merge()`

and assign the results to a variable named ‘new.df’, and use`head()`

to see the results. [Difficulty:**Intermediate**]

### 2.10.4 Plotting in R

Please run the following code snippet for the rest of the exercises.

Make a scatter plot using the

`x1`

and`y1`

vectors generated above. [Difficulty:**Beginner**]Use the

`main`

argument to give a title to`plot()`

as in`plot(x,y,main="title")`

. [Difficulty:**Beginner**]Use the

`xlab`

argument to set a label for the x-axis. Use`ylab`

argument to set a label for the y-axis. [Difficulty:**Beginner**]Once you have the plot, run the following expression in R console.

`mtext(side=3,text="hi there")`

does.**HINT:**`mtext`

stands for margin text. [Difficulty:**Beginner**]See what

`mtext(side=2,text="hi there")`

does. Check your plot after execution. [Difficulty:**Beginner**]Use

*mtext()*and*paste()*to put a margin text on the plot. You can use`paste()`

as ‘text’ argument in`mtext()`

.**HINT:**`mtext(side=3,text=paste(...))`

. See how`paste()`

is used for below. [Difficulty:**Beginner/Intermediate**]

`## [1] "Text here"`

`## [1] "Text here"`

`cor()`

calculates the correlation between two vectors. Pearson correlation is a measure of the linear correlation (dependence) between two variables X and Y. Try using the`cor()`

function on the`x1`

and`y1`

variables. [Difficulty:**Intermediate**]Try to use

`mtext()`

,`cor()`

and`paste()`

to display the correlation coefficient on your scatter plot. [Difficulty:**Intermediate**]Change the colors of your plot using the

`col`

argument. Ex:`plot(x,y,col="red")`

. [Difficulty:**Beginner**]Use

`pch=19`

as an argument in your`plot()`

command. [Difficulty:**Beginner**]Use

`pch=18`

as an argument to your`plot()`

command. [Difficulty:**Beginner**]Make a histogram of

`x1`

with the`hist()`

function. A histogram is a graphical representation of the data distribution. [Difficulty:**Beginner**]You can change colors with ‘col’, add labels with ‘xlab’, ‘ylab’, and add a ‘title’ with ‘main’ arguments. Try all these in a histogram. [Difficulty:

**Beginner**]Make a boxplot of y1 with

`boxplot()`

.[Difficulty:**Beginner**]Make boxplots of

`x1`

and`y1`

vectors in the same plot.[Difficulty:**Beginner**]In boxplot, use the

`horizontal = TRUE`

argument. [Difficulty:**Beginner**]Make multiple plots with

`par(mfrow=c(2,1))`

- run
`par(mfrow=c(2,1))`

- make a boxplot
- make a histogram
[Difficulty:
**Beginner/Intermediate**]

- run
Do the same as above but this time with

`par(mfrow=c(1,2))`

. [Difficulty:**Beginner/Intermediate**]Save your plot using the “Export” button in Rstudio. [Difficulty:

**Beginner**]You can make a scatter plot showing the density of points rather than points themselves. If you use points it looks like this:

If you use the `smoothScatter()`

function, you get the densities.

Now, plot with the `colramp=heat.colors`

argument and then use a custom color scale using the following argument.

`colramp = colorRampPalette(c("white","blue", "green","yellow","red")))`

[Difficulty: **Beginner/Intermediate**]

### 2.10.5 Functions and control structures (for, if/else, etc.)

Read CpG island data as shown below for the rest of the exercises.

```
cpgtFilePath=system.file("extdata",
"CpGi.table.hg18.txt",
package="compGenomRData")
cpgi=read.table(cpgtFilePath,header=TRUE,sep="\t")
head(cpgi)
```

```
## chrom chromStart chromEnd name length cpgNum gcNum perCpg perGc obsExp
## 1 chr1 18598 19673 CpG: 116 1075 116 787 21.6 73.2 0.83
## 2 chr1 124987 125426 CpG: 30 439 30 295 13.7 67.2 0.64
## 3 chr1 317653 318092 CpG: 29 439 29 295 13.2 67.2 0.62
## 4 chr1 427014 428027 CpG: 84 1013 84 734 16.6 72.5 0.64
## 5 chr1 439136 440407 CpG: 99 1271 99 777 15.6 61.1 0.84
## 6 chr1 523082 523977 CpG: 94 895 94 570 21.0 63.7 1.04
```

Check values in the perGc column using a histogram. The ‘perGc’ column in the data stands for GC percent => percentage of C+G nucleotides. [Difficulty:

**Beginner**]Make a boxplot for the ‘perGc’ column. [Difficulty:

**Beginner**]Use if/else structure to decide if the given GC percent is high, low or medium. If it is low, high, or medium: low < 60, high>75, medium is between 60 and 75; use greater or less than operators,

`<`

or`>`

. Fill in the values in the code below, where it is written ‘YOU_FILL_IN’. [Difficulty:**Intermediate**]

```
GCper=65
# check if GC value is lower than 60,
# assign "low" to result
if('YOU_FILL_IN'){
result="low"
cat("low")
}
else if('YOU_FILL_IN'){ # check if GC value is higher than 75,
#assign "high" to result
result="high"
cat("high")
}else{ # if those two conditions fail then it must be "medium"
result="medium"
}
result
```

- Write a function that takes a value of GC percent and decides
if it is low, high, or medium: low < 60, high>75, medium is between 60 and 75.
Fill in the values in the code below, where it is written ‘YOU_FILL_IN’. [Difficulty:
**Intermediate/Advanced**]

```
GCclass<-function(my.gc){
YOU_FILL_IN
return(result)
}
GCclass(10) # should return "low"
GCclass(90) # should return "high"
GCclass(65) # should return "medium"
```

- Use a for loop to get GC percentage classes for
`gcValues`

below. Use the function you wrote above.[Difficulty:**Intermediate/Advanced**]

```
gcValues=c(10,50,70,65,90)
for( i in YOU_FILL_IN){
YOU_FILL_IN
}
```

- Use
`lapply`

to get GC percentage classes for`gcValues`

. [Difficulty:**Intermediate/Advanced**]

Use sapply to get values to get GC percentage classes for

`gcValues`

. [Difficulty:**Intermediate**]Is there a way to decide on the GC percentage class of a given vector of

`GCpercentages`

without using if/else structure and loops ? if so, how can you do it?**HINT:**Subsetting using < and > operators. [Difficulty:**Intermediate**]