Without using your book, derive expressions for \(\hat{\beta}_0\) and \(\hat{\beta}_1\) by solving for them in the normal equations:

\[ \sum_{i=1}^n y_i = n\hat{\beta}_0 + \hat{\beta}_1 \sum_{i=1}^n x_i \\ \sum_{i=1}^n x_i y_i = \hat{\beta}_0 \sum_{i=1}^n x_i + \hat{\beta}_1 \sum_{i=1}^n x_i^2 \]

note: \(\frac{1}{n}\sum_{i=1}^n x_i\) can be rewritten \(\bar{x}\).

Poverty and Graduation

plot of chunk unnamed-chunk-1

Which line?

download.file("http://www.openintro.org/stat/data/mlb11.RData",
              destfile = "mlb11.RData")
load("mlb11.RData")
plot_ss(poverty$Poverty, poverty$Graduates)

Linear models in R

The workhorse function: lm()

m1 <- lm(Graduates ~ Poverty, data = poverty)

The formula notation is read: "I'd like to express y as a function of x". It creates a rich object of class lm.

class(m1)
## [1] "lm"
names(m1)
##  [1] "coefficients"  "residuals"     "effects"       "rank"         
##  [5] "fitted.values" "assign"        "qr"            "df.residual"  
##  [9] "xlevels"       "call"          "terms"         "model"

Linear models in R

You can extract info about your model using

  1. Attributes: reference the attributes found with names() by using the $ operator.

  2. Summary: the most useful information can be displayed using the summary() command.

  3. Print: print your model object to get the basic coefficient estimates.

summary(m1)

## 
## Call:
## lm(formula = Graduates ~ Poverty, data = poverty)
## 
## Residuals:
##    Min     1Q Median     3Q    Max 
## -5.954 -1.820  0.544  1.515  6.199 
## 
## Coefficients:
##             Estimate Std. Error t value Pr(>|t|)    
## (Intercept)   96.202      1.343   71.65  < 2e-16 ***
## Poverty       -0.898      0.114   -7.86  3.1e-10 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 2.5 on 49 degrees of freedom
## Multiple R-squared:  0.558,  Adjusted R-squared:  0.549 
## F-statistic: 61.8 on 1 and 49 DF,  p-value: 3.11e-10

Slope interpretation

The slope, \(\beta_1\), tells you that a one unit increase in the \(x\) is associated with a \(\beta_1\) unit increase in the \(y\), on average.

A one percentage point increase in a state's poverty rate is associated with a -0.8979 decrease in the state's graduation rate, on average.

Why the "on average"?

\[ \hat{E}(Y|X) = \hat{\beta}_0 + \hat{\beta}_1 * x \]

note: sign of \(\beta_1\) is often more interesting than magnitude (effected by scaling).

Intercept interpretation

Mathematically: The expected value of \(y\) when \(x\) is zero.

Contextually: Sometimes, the "start-up" value of the \(y\).

Does the slope have meaning in the poverty vs grad rate example?

Not really.

Activity 3: SLR on quakes

Recall the last question from homework 1: would you expect a relationship between the magnitude of an earthquake and the number of stations that detect it?