The most influential critic in the world today happens to be a critic of wine. His name is Robert Parker.
The most influential critic in the world today happens to be a critic of wine. His name is Robert Parker.
Clive Coates, Master of Wine, is one of the world's leading wine authorities.
Parker is the wine writer who matters. Clives Coates is very serious and well-respected, but in terms of commercial impact his influence is zero.
wine <- read.csv("http://andrewpbray.github.io/data/Bordeaux.csv") dim(wine)
## [1] 72 9
names(wine)
## [1] "Wine" "Price" "ParkerPoints" ## [4] "CoatesPoints" "P95andAbove" "FirstGrowth" ## [7] "CultWine" "Pomerol" "VintageSuperstar"
Develop a regression model to estimate the percentage effect on price of a 1% increase in ParkerPoints and a 1% increase in CoatesPoins using a subset of or all seven predictors.
library(car) scatterplotMatrix(wine[, 2:4])
\[ \widehat{Price} \sim ParkerPoints + CoatesPoints + P95andAbove \\+ FirstGrowth + CultWine + Pomerol + VintageSuperstar \]
m1 <- lm(Price ~ . - Wine, data = wine)
plot(wine$Price ~ m1$fit, data = wine) abline(0, 1)
wine <- transform(wine, logPrice = log(Price), logParkerPoints = log(ParkerPoints), logCoatesPoints = log(CoatesPoints)) m2 <- lm(logPrice ~ . - Price - Wine - ParkerPoints - CoatesPoints, data = wine)
Recall: logging the predictors of interest as well as the response allows us to interpret the estimates slopes as the percentage increase in the \(Y\) associated with a 1% increase in the \(x\).
vif(m2)
## P95andAbove FirstGrowth CultWine Pomerol ## 4.013 1.625 1.188 1.124 ## VintageSuperstar logParkerPoints logCoatesPoints ## 1.139 5.825 1.410
One value exceeds the traditional cutoff of 5, but not by too much. Multicollinearity is a minor issue here.
We're confident we're working with a valid model.
It's also sensible to build a model with:
## Estimate Std. Error t value Pr(>|t|) ## (Intercept) -51.1416 8.98557 -5.6915 3.389e-07 ## P95andAbove 0.1006 0.13697 0.7341 4.656e-01 ## FirstGrowth 0.8697 0.12524 6.9441 2.332e-09 ## CultWine 1.3532 0.14569 9.2878 1.784e-13 ## Pomerol 0.5364 0.09366 5.7275 2.946e-07 ## VintageSuperstar 0.6159 0.22067 2.7910 6.918e-03 ## logParkerPoints 11.5886 2.06763 5.6048 4.744e-07 ## logCoatesPoints 1.6205 0.61154 2.6499 1.013e-02
Note that all of the predictors are signficant at the .05 level (another reason to not worry too much about the borderline VIF), with the exception of P95andAbove
.
We could consider dropping it from the model to remove redudancy with logParkerPrice
.
m3 <- update(m2, . ~ . - P95andAbove) summary(m3)$coef
## Estimate Std. Error t value Pr(>|t|) ## (Intercept) -56.4755 5.26798 -10.721 5.205e-16 ## FirstGrowth 0.8615 0.12430 6.931 2.299e-09 ## CultWine 1.3360 0.14330 9.323 1.339e-13 ## Pomerol 0.5362 0.09333 5.745 2.641e-07 ## VintageSuperstar 0.5947 0.21800 2.728 8.186e-03 ## logParkerPoints 12.7843 1.26915 10.073 6.662e-15 ## logCoatesPoints 1.6045 0.60898 2.635 1.052e-02
Since the parameter estimates barely budged, this is essentially the same model as the previous one: no need to re-check diagnostic plots.