Watch the video series for Session 2
Watch the video lecture associated to Session 2 (find the link on the website and here) – these are a series of short videos that will walk you through some of the basic syntax in R. In particular, we ask that you watch the videos in that link entitled:
Once you’re done, go through the following exercises to get more familiar with the R syntax.
Exercise 0: Swirl courses
The purpose of this first lab is to familiarize yourself with the R language. To this end, we will use one of the Swirl courses (which we installed during the discussion session).
Installation:
library(swirl)
install_course_github("swirldev", "R_Programming_E")
swirl()
This will initiate a series of prompts:
- Type 1 for “1: R Programming: The basics of programming in R” when prompted to install a course.
- Type 1 for “1: R Programming” when prompted to choose a course.
Go through lessons 1 through 11.
Once you’re done, prepare the following exercises which we will go over during the discussion.
Exercise 1: Vectors
- Generate and print a vector of 10 random numbers between 5 and 500.
- Generate a random vector Z of 1000 letters (from “a” to “z”). Hint: the variable
letters
is already defined in R.
- Print a summary of Z in the form of a frequency table.
- Print the list of letters that appear an even number of times in Z.
Exercise 2: Matrices
- Create the following 5 by 5 matrix and store it as variable X.
[,1] [,2] [,3] [,4] [,5]
[1,] 1 6 11 16 21
[2,] 2 7 12 17 22
[3,] 3 8 13 18 23
[4,] 4 9 14 19 24
[5,] 5 10 15 20 25
Create a matrix Y by adding an independent Gaussian noise (random numbers) with mean 0 and standard deviation 1 to each entry of X. Hint: use the rnorm
function.
Find the inverse of Y. Hint: use the solve
function.
Show numerically that the matrix product of Y and its inverse is the identity matrix. Hint: use the %*%
matrix multiplication operator.
Exercise 3: Data frames
- Create the following data frame and name it “exams”.
set.seed(123)
data.frame(
student = c("Alice", "Sarah", "Harry", "Ron", "Kate"),
score = sample(80:100, 5),
letter = sample(c("A","B"), 5, replace = TRUE),
late = sample(c(T, F), 5, replace = TRUE)
)
Compute the mean score for this exam and print it.
Find the student with the highest score and print the corresponding row of “exams”. Hint: use the function which.max()
.
Exercise 4: Control Flow
Part 1
Use a “for”" loop to:
Print all the letters of the Latin alphabet.
Print the numbers 10 to 100 that are divisible by 7
Print the numbers 1 to 100 that are divisible by 5 but not by 3.
Part 2
Find all numbers not greater than 10,000 that are divisible by 5, 7 and 11 and print them.
Print for each of the numbers x = 2, . . . 20, all numbers that divide x (all factors) excluding 1 and x. Hence, for 18, it should print 2 3 6 9.
Exercise 5: Functions
Part 1
Create a function what will return the number of times a given integer is contained a given vector of integers. The function should have two arguments one for a vector and the other for a scalar.
Then, generate a random vector of 100 integers (in a range 1-20) use the function to count the number of times the number 12 is in that vector.
Part 2
Write a function that takes in a data.frame as an input, prints out the column names, and returns its dimensions.
Exercise 6: Apply family functions
Part 1
Below we print six first rows of the built-in dataset, mtcars
, from the 1974 Motor Trend US magazine, which comprises information on the fuel consumption and 10 aspects of automobile design and performance for 32 selected car models.
head(mtcars)
Use apply()
function to find the standard deviation and the 0.8-quantile of each of the automobile characteristic.
Part 2
Below is a vector of dates in year 2018. Hint: you might find the ceiling
function useful.
set.seed(1234)
y2018 <- seq(as.Date("2018-01-01", format = "%Y-%m-%d"),
as.Date("2018-12-31", format = "%Y-%m-%d"),
"days")
length(y2018)
# A random sample of 10 dates from 2018
y2018_sample <- sample(y2018, size = 10)
y2018_sample
Use an apply
family function to return the number of weeks left from each day in y2018_sample
to the New Year, 2019/01/01.
Note: you can calculate the difference between Date objects.
as.Date("2019-01-01", format = "%Y-%m-%d") - as.Date("2018-01-01", format = "%Y-%m-%d")
Time difference of 365 days
