23 + sin(pi/2)
## [1] 24
abs(-10) + (17-3)^4
## [1] 38426
4 * exp(10) + sqrt(2)
## [1] 88107.28
Intuitive arithmetic operators: addition (+
), subtraction (-
), multiplication (*
), division: (/
), exponentiation: (^
), modulus: (%%
)
Built-in constants: pi, LETTERS, letters, month.abb, month.name
Variable assignment can be done using the following operators: =, <-, ->
:
# Assignment using equal operator.
var.1 = 34759
# Assignment using leftward operator.
var.2 <-"learn R"
#Assignment using rightward operator.
TRUE -> var.3
The values of the variables can be printed with print()
function, or cat()
.
print(var.1)
## [1] 34759
cat("var.2 is ", var.2)
## var.2 is learn R
cat("var.3 is ", var.3 ,"\n")
## var.3 is TRUE
Variable names must start with a letter, and can only contain:
a <- 0
first.variable <- 1
SecondVariable <- 2
variable_2 <- 1 + first.variable
very_long_name.3 <- 4
Some words are reserved in R and cannot be used as object names:
Inf
and -Inf
which respectively stand for positive and negative infinity, R will return this when the value is too big, e.g. 2^1024
NULL
denotes a null object. Often used as undeclared function argument.NA
represents a missing value (“Not Available”).NaN
means “Not a Number”. R will return this when a computation is undefined, e.g. 0/0
.Values in R are limited to only 6 atomic classes:
TRUE/FALSE
or T/F
12.4, 30, 2, 1009, 3.141593
2L, 34L, -21L, 0L
3 + 2i, -10 - 4i
'a', '23.5', "good", "Hello world!", "TRUE"
as.raw(2), charToRaw("Hello")
Objects can have different structures based on atomic class and dimensions:
Dimensions | Homogeneous | Heterogeneous |
---|---|---|
1d | vector | list |
2d | matrix | data.frame |
nd | array |
R also supports more complicated objects built upon these.
R is a dynamically typed language, which means that we can change a variable’s data type of the same variable again and again when using it in a program.
x <- "Hello"
cat("The class of x is", class(x),"\n")
## The class of x is character
x <- 34.5
cat(" Now the class of x is ", class(x),"\n")
## Now the class of x is numeric
x <- 27L
cat(" Next the class of x becomes ", class(x),"\n")
## Next the class of x becomes integer
You can see what variables are currently available in the workspace by calling
print(ls())
## [1] "a" "first.variable" "SecondVariable" "var.1" "var.2" "var.3" "variable_2" "very_long_name.3" "x"
Elements of a vector can be accessed using indexing, with square brackets, []
.
Unlike in many languages, in R indexing starts with 1
.
Using negative integer value indices drops corresponding element of the vector.
Logical indexing (TRUE/FALSE
) is allowed.
days <- c("Sun","Mon","Tue","Wed","Thurs","Fri","Sat")
(today <- days[5])
## [1] "Thurs"
# Accessing vector elements using position.
(weekend.days <- days[c(1, 7)])
## [1] "Sun" "Sat"
# Accessing vector elements using negative indexing.
(week.days <- days[c(-1,-7)])
## [1] "Mon" "Tue" "Wed" "Thurs" "Fri"
# Accessing vector elements using logical indexing.
(birthday <- days[c(F, F, F, F, T, F, F)])
## [1] "Thurs"
# Comparisons (==,!=,>,>=,<,<=)
1 == 2
## [1] FALSE
# Check whether number is even
# (%% is the modulus)
(5 %% 2) == 0
## [1] FALSE
# Logical indexing
x <- seq(1,10)
x[(x%%2) == 0]
## [1] 2 4 6 8 10
# Element-wise comparison
c(1,2,3) > c(3,2,1)
## [1] FALSE FALSE TRUE
# Check whether numbers are even,
# one by one
(seq(1,4) %% 2) == 0
## [1] FALSE TRUE FALSE TRUE
# Logical indexing
x <- seq(1,10)
x[x>=5]
## [1] 5 6 7 8 9 10
Two vectors of same length can be added, subtracted, multiplied or divided. Vectors can be concatenated with combine function c()
.
# Create two vectors.
v1 <- c(1,4,7,3,8,15)
v2 <- c(12,9,4,11,0,8)
# Vector addition.
(vec.sum <- v1+v2)
## [1] 13 13 11 14 8 23
# Vector subtraction.
(vec.difference <- v1-v2)
## [1] -11 -5 3 -8 8 7
# Vector multiplication.
(vec.product <- v1*v2)
## [1] 12 36 28 33 0 120
# Vector division.
(vec.ratio <- v1/v2)
## [1] 0.08333333 0.44444444 1.75000000 0.27272727 Inf 1.87500000
# Vector concatenation
vec.concat <- c(v1, v2)
# Size of vector
length(vec.concat)
## [1] 12
# Element-wise multiplication
v1 <- c(1,2,3,4,5,6,7,8,9,10)
v1 * 2
## [1] 2 4 6 8 10 12 14 16 18 20
# Element-wise multiplication
v1 * c(1,2)
## [1] 1 4 3 8 5 12 7 16 9 20
v1 + c(3, 7, 10)
## [1] 4 9 13 7 12 16 10 15 19 13
Note: a warning is not an error. It only informs you that your code continued to run, but perhaps it did not work as you intended.
Matrices in R are objects with homogeneous elements (of the same type), arranged in a 2D rectangular layout. A matrix can be created with a function:
matrix(data, nrow, ncol, byrow, dimnames)
where:
data
is the input vector with elements of the matrix.nrow
is the number of rows to be cratedbyrow
is a logical value. If FALSE
(the default) the matrix is filled by columns, otherwise the matrix is filled by rows.dimnames
is NULL
or a list of length 2 giving the row and column names respectively# Elements are arranged sequentially by column.
(N <- matrix(seq(1,20), nrow = 4, byrow = FALSE))
## [,1] [,2] [,3] [,4] [,5]
## [1,] 1 5 9 13 17
## [2,] 2 6 10 14 18
## [3,] 3 7 11 15 19
## [4,] 4 8 12 16 20
# Elements are arranged sequentially by row.
(M <- matrix(seq(1,20), nrow = 5, byrow = TRUE))
## [,1] [,2] [,3] [,4]
## [1,] 1 2 3 4
## [2,] 5 6 7 8
## [3,] 9 10 11 12
## [4,] 13 14 15 16
## [5,] 17 18 19 20
# Define the column and row names.
rownames <- c("row1", "row2", "row3")
colnames <- c("col1", "col2", "col3", "col4", "col5")
(P <- matrix(c(5:19), nrow = 3, byrow = TRUE,
dimnames = list(rownames, colnames)))
## col1 col2 col3 col4 col5
## row1 5 6 7 8 9
## row2 10 11 12 13 14
## row3 15 16 17 18 19
P[2, 5] # the element in 2nd row and 5th column.
## [1] 14
P[2, ] # the 2nd row.
## col1 col2 col3 col4 col5
## 10 11 12 13 14
P[, 3] # the 3rd column.
## row1 row2 row3
## 7 12 17
P[c(3,2), ] # the 3rd and 2nd row.
## col1 col2 col3 col4 col5
## row3 15 16 17 18 19
## row2 10 11 12 13 14
P[, c(3, 1)] # the 3rd and 1st column.
## col3 col1
## row1 7 5
## row2 12 10
## row3 17 15
P[1:2, 3:5] # Subset 1:2 row 3:5 column
## col3 col4 col5
## row1 7 8 9
## row2 12 13 14
Matrix addition and subtraction needs matrices of same dimensions:
# Create two 2x3 matrices.
(A <- matrix(c(3, 9, -1, 4, 2, 6), nrow = 2))
## [,1] [,2] [,3]
## [1,] 3 -1 2
## [2,] 9 4 6
(B <- matrix(c(5, 2, 0, 9, 3, 4), nrow = 2))
## [,1] [,2] [,3]
## [1,] 5 0 3
## [2,] 2 9 4
A + B # Element-wise sum; (A - B) difference
## [,1] [,2] [,3]
## [1,] 8 -1 5
## [2,] 11 13 10
A * B # Element-wise multiplication
## [,1] [,2] [,3]
## [1,] 15 0 6
## [2,] 18 36 24
A / B # Element-wise division
## [,1] [,2] [,3]
## [1,] 0.6 -Inf 0.6666667
## [2,] 4.5 0.4444444 1.5000000
t(A) # Matrix transpose
## [,1] [,2]
## [1,] 3 9
## [2,] -1 4
## [3,] 2 6
True matrix multiplication A x B
, with \(A \in \mathbb{R}^{m \times n}\) and \(B \in \mathbb{R}^{m \times n}\):
\[ (AB)_{ij} = \sum_{k = 1}^p A_{ik}B_{kj} \]
# A is (2 x 3) and t(B) is (3 x 2)
A %*% t(B) # (2 x 2)-matrix
## [,1] [,2]
## [1,] 21 5
## [2,] 63 78
# t(A) is (3 x 2) and B is (2 x 3)
t(A) %*% B # (3 x 3)-matrix
## [,1] [,2] [,3]
## [1,] 33 81 45
## [2,] 3 36 13
## [3,] 22 54 30
More on matrix algebra here
array()
.row.names <- c("ROW1","ROW2","ROW3", "ROW4")
column.names <- c("COL1","COL2","COL3")
matrix.names <- c("Matrix1","Matrix2")
(arr <- array(
seq(1, 24), dim = c(4,3,2),
dimnames = list(row.names, column.names,
matrix.names)))
## , , Matrix1
##
## COL1 COL2 COL3
## ROW1 1 5 9
## ROW2 2 6 10
## ROW3 3 7 11
## ROW4 4 8 12
##
## , , Matrix2
##
## COL1 COL2 COL3
## ROW1 13 17 21
## ROW2 14 18 22
## ROW3 15 19 23
## ROW4 16 20 24
Lists can contain elements of different types e.g. numbers, strings, vectors and/or another list. List is created using list()
function.
# Unnamed list
v <- c("Jan","Feb","Mar")
M <- matrix(c(1,2,3,4),nrow=2)
lst <- list("green", 12.3)
(u.list <- list(v, M, lst))
## [[1]]
## [1] "Jan" "Feb" "Mar"
##
## [[2]]
## [,1] [,2]
## [1,] 1 3
## [2,] 2 4
##
## [[3]]
## [[3]][[1]]
## [1] "green"
##
## [[3]][[2]]
## [1] 12.3
# Access 2nd element
u.list[[2]]
## [,1] [,2]
## [1,] 1 3
## [2,] 2 4
# Named list
(n.list <- list(
first = "Jane", last = "Doe",
gender = "Female", yearOfBirth = 1990))
## $first
## [1] "Jane"
##
## $last
## [1] "Doe"
##
## $gender
## [1] "Female"
##
## $yearOfBirth
## [1] 1990
# Access 3rd element
n.list[[3]]
## [1] "Female"
# Access "yearOfBirth" element
n.list$yearOfBirth
## [1] 1990
A data frame is a table or a 2D array-like structure, whose:
# Create the data frame.
employees <- data.frame(
row.names = c("E1", "E2", "E3","E4", "E5"),
name = c("Rick","Dan","Michelle","Ryan","Gary"),
salary = c(623.3,515.2,611.0,729.0,843.25),
start_date = as.Date(c("2012-01-01", "2013-09-23", "2014-11-15", "2014-05-11", "2015-03-27")),
stringsAsFactors = FALSE )
# Print the data frame.
employees
## name salary start_date
## E1 Rick 623.30 2012-01-01
## E2 Dan 515.20 2013-09-23
## E3 Michelle 611.00 2014-11-15
## E4 Ryan 729.00 2014-05-11
## E5 Gary 843.25 2015-03-27
# Get the structure of the data frame.
str(employees)
## 'data.frame': 5 obs. of 3 variables:
## $ name : chr "Rick" "Dan" "Michelle" "Ryan" ...
## $ salary : num 623 515 611 729 843
## $ start_date: Date, format: "2012-01-01" "2013-09-23" "2014-11-15" "2014-05-11" ...
# Print first few rows of the data frame.
head(employees, 2)
## name salary start_date
## E1 Rick 623.3 2012-01-01
## E2 Dan 515.2 2013-09-23
# Print statistical summary of the data frame.
summary(employees)
## name salary start_date
## Length:5 Min. :515.2 Min. :2012-01-01
## Class :character 1st Qu.:611.0 1st Qu.:2013-09-23
## Mode :character Median :623.3 Median :2014-05-11
## Mean :664.4 Mean :2014-01-14
## 3rd Qu.:729.0 3rd Qu.:2014-11-15
## Max. :843.2 Max. :2015-03-27
# using column names.
employees$name
employees[, c("name", "salary")]
# # or using integer indexing
# employees[, 1]
# employees[, c(1, 2)]
## [1] "Rick" "Dan" "Michelle" "Ryan" "Gary"
## name salary
## E1 Rick 623.30
## E2 Dan 515.20
## E3 Michelle 611.00
## E4 Ryan 729.00
## E5 Gary 843.25
# using row names.
employees["E1",]
employees[c("E2", "E3"), ]
# using integer indexing
employees[1, ]
employees[c(2, 3), ]
## name salary start_date
## E1 Rick 623.3 2012-01-01
## name salary start_date
## E2 Dan 515.2 2013-09-23
## E3 Michelle 611.0 2014-11-15
# Add the "dept" coulmn.
employees$dept <-
c("IT","Operations","IT","HR","Finance")
employees
## name salary start_date dept
## E1 Rick 623.30 2012-01-01 IT
## E2 Dan 515.20 2013-09-23 Operations
## E3 Michelle 611.00 2014-11-15 IT
## E4 Ryan 729.00 2014-05-11 HR
## E5 Gary 843.25 2015-03-27 Finance
rbind()
function:# Create the second data frame
new.employees <- data.frame(
row.names = paste0("E", 6:8),
name = c("Rasmi","Pranab","Tusar"),
salary = c(578.0,722.5,632.8),
start_date = as.Date(c("2013-05-21","2013-07-30","2014-06-17")),
dept = c("IT","Operations","Fianance"),
stringsAsFactors = FALSE )
# Concatenate two data frames.
(all.employees <- rbind(employees, new.employees))
## name salary start_date dept
## E1 Rick 623.30 2012-01-01 IT
## E2 Dan 515.20 2013-09-23 Operations
## E3 Michelle 611.00 2014-11-15 IT
## E4 Ryan 729.00 2014-05-11 HR
## E5 Gary 843.25 2015-03-27 Finance
## E6 Rasmi 578.00 2013-05-21 IT
## E7 Pranab 722.50 2013-07-30 Operations
## E8 Tusar 632.80 2014-06-17 Fianance
Factors are used to categorize the data and store it as levels. They are useful for variables which take on a limited number of unique values.
days <- c("Mon", "Tue", "Wed", "Thu", "Fri", "Sat", "Sun")
is.factor(month.name)
## [1] FALSE
class(days) # Indeed these are strings of characters
## [1] "character"
If not specified, R will order character type by alphabetical order.
( days <- factor(days) ) # Convert to factors
## [1] Mon Tue Wed Thu Fri Sat Sun
## Levels: Fri Mon Sat Sun Thu Tue Wed
is.factor(days)
## [1] TRUE
days.sample <- sample(days, 5)
days.sample
## [1] Thu Sun Tue Mon Fri
## Levels: Fri Mon Sat Sun Thu Tue Wed
# Create factor with given levels
(days.sample <- factor(days.sample, levels = days))
## [1] Thu Sun Tue Mon Fri
## Levels: Mon Tue Wed Thu Fri Sat Sun
# Create factor with ordered levels
(days.sample <- factor(days.sample, levels = days, ordered = TRUE))
## [1] Thu Sun Tue Mon Fri
## Levels: Mon < Tue < Wed < Thu < Fri < Sat < Sun
Note that factor labels are not the same as levels.
day_names <- c("Monday", "Tuesday", "Wednesday", "Thursday", "Friday", "Saturday", "Sunday")
(days <- factor(days, levels = days, labels = day_names))
## [1] Monday Tuesday Wednesday Thursday Friday Saturday Sunday
## Levels: Monday Tuesday Wednesday Thursday Friday Saturday Sunday
R makes it easy to work with dates.
# Define a sequence of dates
x <- seq(from=as.Date("2018-01-01"),to=as.Date("2018-05-31"), by=1)
table(months(x))
##
## April February January March May
## 30 28 31 31 31
Sys.Date() # What day is it?
## [1] "2020-06-22"
Sys.time() # What time is it?
## [1] "2020-06-22 10:31:15 PDT"
# Number of days until the New Year.
as.Date('2019-01-01') - Sys.Date()
## Time difference of -538 days
Type ?strptime
for a list of possible date formats.
You can generate a random sample from the elements of a vector using the function sample.
v <- seq(1, 10)
sample(v, 5) # Sampling without replacement
## [1] 3 6 7 10 2
month.name
## [1] "January" "February" "March" "April" "May" "June" "July" "August" "September" "October" "November" "December"
sample(month.name, 10, replace = TRUE) # Sampling with replacement
## [1] "June" "June" "June" "February" "March" "February" "September" "September" "October" "April"
Tables – the contents of a discrete vector can be easily summarized in a table.
x <- sample(v, 1000, replace=TRUE) # Random sample
table(x)
## x
## 1 2 3 4 5 6 7 8 9 10
## 87 94 102 100 100 90 101 99 117 110