Data Taxonomy

Numerical descriptors

One Variable

  • Shape: uni- or multi-modal? right- or left-skewed?
  • Center: mean(), median(), mode (table())
  • Spread: sd(), var(), iqr()

For a one stop shop: summary().

7 Billion: Are you typical?

7 Billion: Are you typical?

  1. What variables do we have data on? What type of data is each?
  2. What numerical descriptors are referenced in the video?
  3. Write out pseudo-R-code that could have generated the statistic about the average height of the Dutch (that is, invent your own dataframe and column names, but use R syntax).

Graphical descriptors

Categorical Data: bar chart (base)

barplot(table(nc$premie))

plot of chunk unnamed-chunk-3

Graphical descriptors

Categorical Data: bar chart (ggplot2)

ggplot(nc, aes(premie)) + geom_bar()

plot of chunk unnamed-chunk-4

Graphical descriptors

Categorical Data: pie chart (base)

pie(table(nc$premie))

plot of chunk unnamed-chunk-5

Graphical descriptors

Categorical Data: pie chart (base)

Pie charts = confusion

Graphical descriptors

Numerical Data: histogram (base)

hist(nc$weight)

plot of chunk unnamed-chunk-6

Graphical descriptors

Numerical Data: histogram (ggplot2)

ggplot(nc, aes(weight)) + geom_histogram()

plot of chunk unnamed-chunk-7

Graphical descriptors

Numerical Data: density plot (base)

plot(density(nc$weight))

plot of chunk unnamed-chunk-8

Graphical descriptors

Numerical Data: density plot (ggplot2)

ggplot(nc, aes(weight)) + geom_density()

plot of chunk unnamed-chunk-9

Graphical descriptors

Numerical Data: boxplot (base)

boxplot(nc$weight)

plot of chunk unnamed-chunk-10

Graphical descriptors

Numerical Data: boxplot (ggplot2)

ggplot(nc, aes(y = weight, x = factor(1))) + geom_boxplot()

plot of chunk unnamed-chunk-11

Numerical descriptors

Two (numerical) Variables

  1. Shape: linear, quadratic
  2. Direction: positive/neg in slope/curvature
  3. Strength: how tightly clustered?

A measure of the strength of a linear relationship: r, the correlation coefficient (cor()).

Graphical descriptors

Scatterplot (base)

plot(x = nc$fage, y = nc$mage)

plot of chunk unnamed-chunk-12

Graphical descriptors

Scatterplot (base)

plot(mage ~ fage, data = nc)

plot of chunk unnamed-chunk-13

Graphical descriptors

Scatterplot (ggplot2)

ggplot(nc, aes(x = fage, y = mage)) + geom_point()

plot of chunk unnamed-chunk-14

ggplot2 sandbox

mplot(nc)