Last time:
This time:
In simple linear regression we use residual plots to assess:
If this was a SLR model, we could conclude that the mean function looks fairly linear but there the errors appear to have increasing variance.
We fit the model:
\[ y \sim x_1 + x_2 \]
But this is synthetic data generated from a model with constant variance.
Whaaaaaa?
In MLR, in general, you cannot infer the structure you see in the residuals vs fitted plot as being the structure that was misspecified.
The only conclusion you can draw is that something is misspecified.
So now what?
The objective of constructing an added variable plot is to assess how much each variable adds to your model.
Consider the nyc restaurant data, where we'd like to build the model:
\[ Price \sim Food + Decor + Service + East \]
We can assess the isolated effect of each predictor on the response with a series of simple scatterplots…
pairs(Price ~ Food + Decor + Service + East, data = nyc)
An added variable plot tells you how much a given predictor \(x_i\) can explain the response after the other predictors have been taken into account. They plot:
First, get the residuals from the model
\[ Price \sim Decor + Service + East \]
resY <- lm(Price ~ Decor + Service + East, data = nyc)$res
Second, get the residuals from the model
\[ Food \sim Decor + Service + East \]
resX <- lm(Food ~ Decor + Service + East, data = nyc)$res
The plot them against each other…
plot(resY ~ resX)
library(car) m1 <- lm(Price ~ Food + Decor + Service + East, data = nyc) avPlot(m1,variable = "Food")
If we fit a line through the AVP, the slope should look familiar…
AVPm1 <- lm(resY ~ resX) AVPm1$coef
## (Intercept) resX ## 5.074e-17 1.538e+00
m1$coef
## (Intercept) Food Decor Service East ## -24.023800 1.538120 1.910087 -0.002727 2.068050
avPlots(m1)
In the data set LA, this scatterplot suggests two influential points but are they influential in a MLR model?
In the data set LA, this scatterplot suggests two influential points but are they influential in a MLR model?
influence(m1)$hat
.)