Goal: use statistics calculated from data to makes inferences about the nature of parameters.
In regression,
Classical tools of inference:
A confidence interval expresses the amount of uncertainly we have in our estimate of a particular parameter. A general 1 - \(\alpha\) confidence interval takes the form
\[ \hat{\theta} \pm t^{*} * SE(\hat{\theta}) \]
\(Y\) is related to \(x\) by a simple linear regression model. \[ E(Y|X) = \beta_0 + \beta_1 * x \]
The errors \(e_1, e_2, \ldots, e_n\) are independent of one another.
The errors have a common variance \(\sigma^2\).
The errors are normally distributed: \(e \sim N(0, \sigma^2)\)
Let's assume the following model as true:
\[ E(Y|X) = 12 + .7 * x; e \sim N(0, 4) \]
Characteristics:
Our best guess of \(\beta_1\) is \(\hat{\beta}_1\). And since we have to estimate \(\sigma\) with \(\hat{\sigma}^2 = RSS/n-2\), the distribution isn't normal, but…
T with n - 2 degrees of freedom.
And we summarize that approximate sampling distribution using a CI:
\[ \hat{\beta}_1 \pm t_{\alpha/2, n-2} * SE(\hat{\beta}_1) \]
where
\[ SE(\hat{\beta}_1) = s/\sqrt(SXX) \]
\[ \hat{\beta}_1 \pm t_{\alpha/2, n-2} * SE(\hat{\beta}_1) \]
beta_1 <- m1$coef[2] alpha <- .05 t_stat <- qt(1-alpha/2, n - 2) SE <- summary(m1)$coef[[4]] moe <- t_stat * SE c(beta_1 - moe, beta_1 + moe) confint(m1, "x") # to double check
We are 95% confident that the true slope between x and y lies between LB and UB.
Suppose we are interested in testing the claim that the slope is zero.
\[ H_0: \beta_1^0 = 0 \\ H_A: \beta_1^0 \ne 0 \]
We know that
\[ T = \frac{\hat{\beta}_1 - \beta_1^0}{SE(\hat{\beta}_1)} \]
T will be t distributed with \(n-2\) degrees of freedom and with \(SE(\hat{\beta}_1)\) calculated the same as in the CI.
Often less interesting (but not always!). You use the t-distribution again but with a different \(SE\).