Intervals for \(\hat{y}^*\) and \(Y^*\)

A note on notation

  • Capital letters are random variables (e.g. \(X\) and \(Y\)) while lower case letters are the values that those variables have taken.
  • If the \(x\) or the \(y\) has a subscript, it's referring to that coordinate of one of the observations. e.g. \(x_i\), \(x_n\).
  • If it has a hat, that means it's an estimate. e.g. \(\hat{y}_i\), \(\hat{\beta}_1\).
  • If it has an asterisk, that means it's a new specific value that's not in the data set. e.g. \(x^*\).

  • What does \(\hat{y}^*\) mean?
  • What does \(Y^*\) mean?

Considering a single data set

plot of chunk unnamed-chunk-1

Interval for \(\hat{y}^*\)

What value would we predict for a new \(x^*\)?

\[ \hat{y}^* = \hat{\beta}_0 + \hat{\beta}_1 * x^* \]

plot of chunk unnamed-chunk-2

Interval for \(\hat{y}^*\)

How much uncertainty do we have in that prediction?

\[ \hat{y}^* = \hat{\beta}_0 + \hat{\beta}_1 * x^* \]

Two sources of uncertainty:

  1. estimating \(\beta_0\)
  2. estimating \(\beta_1\)

We can calculate \(SE(\hat{y}^*)\):

\[ S \sqrt{\frac{1}{n} + \frac{(x^* - \bar{x})^2}{SXX}} \]

Interval for \(\hat{y}^*\)

We know that \(\hat{y}^*\) will also be t-distributed, so we can form a CI:

\[ \hat{y}^* \pm t * SE(\hat{y}^*) \]

m1 <- lm(f_data~x)
x_star <- 24
m1$coef[1] + m1$coef[2] * x_star
## (Intercept) 
##       28.46
predict(m1, data.frame(x = x_star), interval = "confidence")
##     fit   lwr   upr
## 1 28.46 27.35 29.56

Interval for \(\hat{y}^*\)

plot of chunk unnamed-chunk-4

Consider the SE term:

\[ SE(\hat{y}^*) = S \sqrt{\frac{1}{n} + \frac{(x^* - \bar{x})^2}{SXX}} \]

For what values of \(x^*\) would you expect the interval for \(\hat{y}^*\) to be the narrowest?

plot of chunk unnamed-chunk-5

Look familiar?

Prediction interval for \(Y^*\)

\(Y^*\) represents the actual values that you might be observed in the y. This comes not from the estimated mean function:

\[ \hat{y} = \hat{\beta}_0 + \hat{\beta}_1 * x \]

But from the estimated data generating function:

\[ Y = \hat{\beta}_0 + \hat{\beta}_1 * x + e\]

Which has three sources of uncertainty:

  1. estimating \(\beta_0\).
  2. estimating \(\beta_1\).
  3. the random error \(e\).

Prediction interval for \(Y^*\)

The SE for the CI:

\[ SE(\hat{y}^*) = S \sqrt{\frac{1}{n} + \frac{(x^* - \bar{x})^2}{SXX}} \]

gains an extra term for the PI:

\[ SE(Y^*) = S \sqrt{1 + \frac{1}{n} + \frac{(x^* - \bar{x})^2}{SXX}} \]

Prediction interval for \(Y^*\)

What is the 95% prediction interval for \(x^* = 24\)?

\[ \hat{y}^* \pm t * SE(Y^*) \]

m1 <- lm(f_data~x)
x_star <- 24
m1$coef[1] + m1$coef[2] * x_star
## (Intercept) 
##       28.46
predict(m1, data.frame(x = x_star), interval = "prediction")
##     fit   lwr  upr
## 1 28.46 23.81 33.1

Prediction interval for \(Y^*\)

plot of chunk unnamed-chunk-7

Comparing intervals

plot of chunk unnamed-chunk-8

Boardwork to revisit representing data with smooth functions

Regression Assumptions

  1. \(Y\) is related to \(x\) by a simple linear regression model. \[ E(Y|X) = \beta_0 + \beta_1 * x \]

  2. The errors \(e_1, e_2, \ldots, e_n\) are independent of one another.

  3. The errors have a common variance \(\sigma^2\).

  4. The errors are normally distributed: \(e \sim N(0, \sigma^2)\)

Said another way…

\[ f(Y|X = x) \sim N(\beta_0 + \beta_1 * x, \sigma^2) \]

Regression is a functional smooth summary of the structure of the conditional distribution of \(Y|X\).

Simulating from the conditional density function

n <- 60
beta_0 <- 12
beta_1 <- .7
sigma <- 2
x <- rnorm(n, mean = 20, sd = 3)
f_mean <- beta_0 + beta_1 * x # mean function
f_data <- f_mean + rnorm(n, mean = 0, sd = sigma) # data generating function