download.file("http://www.openintro.org/stat/data/nc.RData", destfile = "nc.RData") load("nc.RData")
- What are the dimensions?
- What mode of data is in each column? (
head()
)
download.file("http://www.openintro.org/stat/data/nc.RData", destfile = "nc.RData") load("nc.RData")
head()
)class(nc)
## [1] "data.frame"
dim(nc)
## [1] 1000 13
length(nc)
## [1] 13
attributes()
does this dataframe have?$
is an operator used to access a column (vector) from a dataframe.
head(nc$fage)
## [1] NA NA 19 21 NA NA
length(nc$fage)
## [1] 1000
It can be subsetted just like a vector.
nc$fage[3]
## [1] 19
You can also subset the whole dataframe like a matrix. What do you think the following commands do?
nc[1:10, 1:2] nc[1, ] nc[nc$gender == "female", "premie"]
Note that the last command could be written also using vector subsetting.
nc$premie[nc$gender == "female"]
Subset by index: specify inside the square bracks exactly which elements you want.
nc$gender[1:10]
Subset by logical: specify conditions inside brackets with logical operators that will evaluate to a T/F vector of the same length as that being subsetted.
nc$gender[nc$premie == "premie"]
Note that the vector that you're using to specify the condition can be different from the one you're subsetting, but it has to be of the same length.
There are many statistical functions that take a vector as an argument (mean()
, sum()
, median()
, sd()
, max()
, min()
, etc.) that can also be used to subset.
nc[nc$mage == max(nc$mage), ]
## fage mage mature weeks premie visits marital gained weight ## 1000 45 50 mature mom 39 full term 14 not married 23 7.13 ## lowbirthweight gender habit whitemom ## 1000 not low female nonsmoker white
Fix the following subsetting errors using data from nc
.
nc[nc$visits = 4, ] nc[-1:4, ] nc[nc$visits <= 5] nc[nc$visits == 4 | 6, ]