Simple Linear Regression

Intervals for \(\hat{y}^\) and \(Y^\)

A note on notation

Capital letters are random variables (e.g. \(X\) and \(Y\)) while lower case letters are the values that those variables have taken.
If the \(x\) or the \(y\) has a subscript, it's referring to that coordinate of one of the observations. e.g. \(x_i\), \(x_n\).
If it has a hat, that means it's an estimate. e.g. \(\hat{y}_i\), \(\hat{\beta}_1\).
If it has an asterisk, that means it's a new specific value that's not in the data set. e.g. \(x^*\).
What does \(\hat{y}^*\) mean?
What does \(Y^*\) mean?

Considering a single data set

plot of chunk unnamed-chunk-1

Interval for \(\hat{y}^*\)

What value would we predict for a new \(x^*\)?

\[ \hat{y}^* = \hat{\beta}_0 + \hat{\beta}_1 * x^* \]

plot of chunk unnamed-chunk-2

Interval for \(\hat{y}^*\)

How much uncertainty do we have in that prediction?

\[ \hat{y}^* = \hat{\beta}_0 + \hat{\beta}_1 * x^* \]

Two sources of uncertainty:

estimating \(\beta_0\)
estimating \(\beta_1\)

We can calculate \(SE(\hat{y}^*)\):

\[ S \sqrt{\frac{1}{n} + \frac{(x^* - \bar{x})^2}{SXX}} \]

Interval for \(\hat{y}^*\)

We know that \(\hat{y}^*\) will also be t-distributed, so we can form a CI:

\[ \hat{y}^* \pm t * SE(\hat{y}^*) \]

m1 <- lm(f_data~x)
x_star <- 24
m1$coef[1] + m1$coef[2] * x_star

## (Intercept) 
##       28.46

predict(m1, data.frame(x = x_star), interval = "confidence")

##     fit   lwr   upr
## 1 28.46 27.35 29.56

Interval for \(\hat{y}^*\)

plot of chunk unnamed-chunk-4

Consider the SE term:

\[ SE(\hat{y}^*) = S \sqrt{\frac{1}{n} + \frac{(x^* - \bar{x})^2}{SXX}} \]

For what values of \(x^*\) would you expect the interval for \(\hat{y}^*\) to be the narrowest?

plot of chunk unnamed-chunk-5

Look familiar?

Prediction interval for \(Y^*\)

\(Y^*\) represents the actual values that you might be observed in the y. This comes not from the estimated mean function:

\[ \hat{y} = \hat{\beta}_0 + \hat{\beta}_1 * x \]

But from the estimated data generating function:

\[ Y = \hat{\beta}_0 + \hat{\beta}_1 * x + e\]

Which has three sources of uncertainty:

estimating \(\beta_0\).
estimating \(\beta_1\).
the random error \(e\).

Prediction interval for \(Y^*\)

The SE for the CI:

\[ SE(\hat{y}^*) = S \sqrt{\frac{1}{n} + \frac{(x^* - \bar{x})^2}{SXX}} \]

gains an extra term for the PI:

\[ SE(Y^*) = S \sqrt{1 + \frac{1}{n} + \frac{(x^* - \bar{x})^2}{SXX}} \]

Prediction interval for \(Y^*\)

What is the 95% prediction interval for \(x^* = 24\)?

\[ \hat{y}^* \pm t * SE(Y^*) \]

m1 <- lm(f_data~x)
x_star <- 24
m1$coef[1] + m1$coef[2] * x_star

## (Intercept) 
##       28.46

predict(m1, data.frame(x = x_star), interval = "prediction")

##     fit   lwr  upr
## 1 28.46 23.81 33.1

Prediction interval for \(Y^*\)

plot of chunk unnamed-chunk-7

Comparing intervals

plot of chunk unnamed-chunk-8

Boardwork to revisit representing data with smooth functions

Regression Assumptions

\(Y\) is related to \(x\) by a simple linear regression model. \[ E(Y|X) = \beta_0 + \beta_1 * x \]
The errors \(e_1, e_2, \ldots, e_n\) are independent of one another.
The errors have a common variance \(\sigma^2\).
The errors are normally distributed: \(e \sim N(0, \sigma^2)\)

Said another way…

\[ f(Y|X = x) \sim N(\beta_0 + \beta_1 * x, \sigma^2) \]

Regression is a functional smooth summary of the structure of the conditional distribution of \(Y|X\).

Simulating from the conditional density function

n <- 60
beta_0 <- 12
beta_1 <- .7
sigma <- 2
x <- rnorm(n, mean = 20, sd = 3)
f_mean <- beta_0 + beta_1 * x # mean function
f_data <- f_mean + rnorm(n, mean = 0, sd = sigma) # data generating function

Intervals for \(\hat{y}^*\) and \(Y^*\)

A note on notation

Considering a single data set

Interval for \(\hat{y}^*\)

Interval for \(\hat{y}^*\)

Interval for \(\hat{y}^*\)

Interval for \(\hat{y}^*\)

Prediction interval for \(Y^*\)

Prediction interval for \(Y^*\)

Prediction interval for \(Y^*\)

Prediction interval for \(Y^*\)

Comparing intervals

Boardwork to revisit representing data with smooth functions

Regression Assumptions

Simulating from the conditional density function

Intervals for \(\hat{y}^\) and \(Y^\)