Activity #7 Part I
- Revisit the RailTrail data set from Activity 4.
- Consider two models: a) SLR model to predict ridership by temperature, b) same approach but with added quadratic term.
- Discuss the relative merits of the two models.
Activity #7 Part I
For the rest of the week, we won't talk at all about assessing model validity (looking at residual plots). that step is absolutely vital, but we're putting it on hold till next week.
Consider a sample of 15 textbooks. How well can we predict weight by volume?
Consider a sample of 15 textbooks. How well can we predict weight by volume?
summary(m1)
## ## Call: ## lm(formula = weight ~ volume, data = allbacks) ## ## Residuals: ## Min 1Q Median 3Q Max ## -190.0 -109.9 38.1 109.7 145.6 ## ## Coefficients: ## Estimate Std. Error t value Pr(>|t|) ## (Intercept) 107.6793 88.3776 1.22 0.24 ## volume 0.7086 0.0975 7.27 6.3e-06 *** ## --- ## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1 ## ## Residual standard error: 124 on 13 degrees of freedom ## Multiple R-squared: 0.803, Adjusted R-squared: 0.787 ## F-statistic: 52.9 on 1 and 13 DF, p-value: 6.26e-06
allbacks[c(1, 2, 9, 10), ]
## volume area weight cover ## 1 885 382 800 hb ## 2 1016 468 950 hb ## 9 953 0 700 pb ## 10 929 0 650 pb
We should be able to better predict the weight if we use both the volume and knowledge of the type of cover.
class(allbacks$cover)
## [1] "factor"
levels(allbacks$cover)
## [1] "hb" "pb"
m2 <- lm(weight ~ volume + cover, data = allbacks)
factor
. Check with class()
, coerce with as.factor()
.summary(m2)$coef
## Estimate Std. Error t value Pr(>|t|) ## (Intercept) 197.963 59.19274 3.344 5.841e-03 ## volume 0.718 0.06153 11.669 6.598e-08 ## coverpb -184.047 40.49420 -4.545 6.719e-04
\[ \widehat{weight} = 197.96 + 0.712 volume - 184.047 coverpb \]
hb
) becomes the reference level.coverpb
represents the average difference in weight between two books of the same weight but of different covers.\[ \widehat{weight} = 107.70 + 0.71 volume \]
\[ \widehat{weight}_{hb} = 197.96 + 0.72 volume \\ \widehat{weight}_{pb} = 13.91 + 0.72 volume \]
summary(m1)$r.squared
## [1] 0.8026
summary(m2)$r.squared
## [1] 0.9275
summary(m1)$adj.r.squared
## [1] 0.7875
summary(m2)$adj.r.squared
## [1] 0.9154
We've established that this data is best modeled with two intercepts, but should the two lines have their own slopes as well?
\[ \widehat{weight} = \beta_0 + \beta_1 volume + \beta_2 coverpb + \beta_3 volume * coverpb \]
\[ \widehat{weight}_{hb} = 161.586 + 0.76 volume \\ \widehat{weight}_{pb} = 11.37 + 0.68 volume \]
m3 <- lm(weight ~ volume * cover, data = allbacks) summary(m3)
## ## Call: ## lm(formula = weight ~ volume * cover, data = allbacks) ## ## Residuals: ## Min 1Q Median 3Q Max ## -89.7 -32.1 -21.8 17.9 215.9 ## ## Coefficients: ## Estimate Std. Error t value Pr(>|t|) ## (Intercept) 161.5865 86.5192 1.87 0.089 . ## volume 0.7616 0.0972 7.84 7.9e-06 *** ## coverpb -120.2141 115.6590 -1.04 0.321 ## volume:coverpb -0.0757 0.1280 -0.59 0.566 ## --- ## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1 ## ## Residual standard error: 80.4 on 11 degrees of freedom ## Multiple R-squared: 0.93, Adjusted R-squared: 0.911 ## F-statistic: 48.5 on 3 and 11 DF, p-value: 1.24e-06
The interaction terms is insignificant, suggesting that the two classes of books might not follow different trends between their volume and weights.
Note that it also made the intercept term insignificant as well.