Vector Indexing

Consider the vector x <- c(12,15,8,11,24). How do we create a vector of the differences between adjacent elements?

x <- c(12, 15, 8, 11, 24)
x[-1] - x[-length(x)]
## [1]  3 -7  3 13

Work in vectors instead of loops where possible.

Vector Subsetting/Filtering

What's going on here?

x <- c(12, 15, 8, 11, 24)
x[x < 10]
## [1] 8
x < 10
## [1] FALSE FALSE  TRUE FALSE FALSE

We can index our vector with a logical vector of the same length.

Subsetting with logicals

x <- c(12, 15, 8, 11, 24)
i <- c(F, F, T, F, F)
x[i]
## [1] 8
which(x < 10)
## [1] 3
x[x < 10] <- 10
x
## [1] 12 15 10 11 24

Useful fact!

i <- c(F, F, T, F, F)
sum(i)
## [1] 1
mean(i)
## [1] 0.2
x <- c(12, 15, 8, 11, 24)
mean(x > 11)
## [1] 0.6

Logical operators

< less than

<= less than or equal

> greater than

>= greater than or equal

== exactly equal to

!= not equal to

!x not equal to x

x | y x or y

x & y x and y

How can you subset the following vector to exclude both elements less than 10 and all even elements? (hint: %%)

x <- c(12, 15, 8, 11, 24)

Activity: Vector Generation and Subsetting

From Vectors to Matrices

Consider the vector

v <- letters
length(v)
## [1] 26

we can turn this into a matrix.

m <- matrix(v, nrow = 2)
length(m)
## [1] 26
dim(m)
## [1]  2 13

From Vectors to Matrices

m <- matrix(v, nrow = 2)
m
##      [,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8] [,9] [,10] [,11] [,12] [,13]
## [1,] "a"  "c"  "e"  "g"  "i"  "k"  "m"  "o"  "q"  "s"   "u"   "w"   "y"  
## [2,] "b"  "d"  "f"  "h"  "j"  "l"  "n"  "p"  "r"  "t"   "v"   "x"   "z"

By default, R will spool the vector into a matrix by first filling the columns. Which argument can we tweak to change this? (?matrix)

matrix(v, nrow = 2, byrow = TRUE)

Matrix Subsetting

Square brackets again, but with [rows, columns].

m[2, 3]
## [1] "f"
m[2, 2:4]
## [1] "d" "f" "h"
m[, 3]
## [1] "e" "f"

Leaving a blank means "all".

Data structures in R

  1. Vector: homogeneous groups of integers, numerics, logicals, characters. Scalars are just vectors of length one.

  2. Matrix: vectors with an additional dimension attributes (also homogeneous).

  3. Dataframe: matrix of heterogeneous columns of the same length.

Dataframe: nc

download.file("http://www.openintro.org/stat/data/nc.RData", destfile = "nc.RData")
load("nc.RData")
  • What are the dimensions?
  • What mode of data is in each column? (head())