nyc <- read.csv("http://andrewpbray.github.io/data/nyc.csv") dim(nyc)
## [1] 168 7
nyc[1:3,]
## Case Restaurant Price Food Decor Service East ## 1 1 Daniella Ristorante 43 22 18 20 0 ## 2 2 Tello's Ristorante 32 20 19 19 0 ## 3 3 Biricchino 34 21 13 18 0
Let's look at the relationship between price, food rating, and decor rating.
\[ Price \sim Food + Decor \]
nyc[1:3, ]
## Case Restaurant Price Food Decor Service East ## 1 1 Daniella Ristorante 43 22 18 20 0 ## 2 2 Tello's Ristorante 32 20 19 19 0 ## 3 3 Biricchino 34 21 13 18 0
m1 <- lm(Price ~ Food + Decor, data = nyc)
summary(m1)
## ## Call: ## lm(formula = Price ~ Food + Decor, data = nyc) ## ## Residuals: ## Min 1Q Median 3Q Max ## -14.945 -3.766 -0.153 3.701 18.757 ## ## Coefficients: ## Estimate Std. Error t value Pr(>|t|) ## (Intercept) -24.5002 4.7230 -5.187 6.19e-07 *** ## Food 1.6461 0.2615 6.294 2.68e-09 *** ## Decor 1.8820 0.1919 9.810 < 2e-16 *** ## --- ## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1 ## ## Residual standard error: 5.788 on 165 degrees of freedom ## Multiple R-squared: 0.6167, Adjusted R-squared: 0.6121 ## F-statistic: 132.7 on 2 and 165 DF, p-value: < 2.2e-16
The mean function is . . .
When you have two continuous predictors \(x_1\), \(x_2\), then your mean function is . . .
a plane
Does the price depend on where the restaurant is located in Manhattan?
\[ Price \sim Food + Decor + East \]
nyc[1:3, ]
## Case Restaurant Price Food Decor Service East ## 1 1 Daniella Ristorante 43 22 18 20 0 ## 2 2 Tello's Ristorante 32 20 19 19 0 ## 3 3 Biricchino 34 21 13 18 0
m2 <- lm(Price ~ Food + Decor + East, data = nyc) summary(m2)
## ## Call: ## lm(formula = Price ~ Food + Decor + East, data = nyc) ## ## Residuals: ## Min 1Q Median 3Q Max ## -14.0451 -3.8809 0.0389 3.3918 17.7557 ## ## Coefficients: ## Estimate Std. Error t value Pr(>|t|) ## (Intercept) -24.0269 4.6727 -5.142 7.67e-07 *** ## Food 1.5363 0.2632 5.838 2.76e-08 *** ## Decor 1.9094 0.1900 10.049 < 2e-16 *** ## East 2.0670 0.9318 2.218 0.0279 * ## --- ## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1 ## ## Residual standard error: 5.72 on 164 degrees of freedom ## Multiple R-squared: 0.6279, Adjusted R-squared: 0.6211 ## F-statistic: 92.24 on 3 and 164 DF, p-value: < 2.2e-16
m3 <- lm(Price ~ Food + Decor + East + Decor:East, data = nyc) summary(m3)
## ## Call: ## lm(formula = Price ~ Food + Decor + East + Decor:East, data = nyc) ## ## Residuals: ## Min 1Q Median 3Q Max ## -13.7855 -3.6649 0.3785 3.7292 17.6358 ## ## Coefficients: ## Estimate Std. Error t value Pr(>|t|) ## (Intercept) -29.3971 6.3770 -4.610 8.10e-06 *** ## Food 1.6634 0.2822 5.895 2.09e-08 *** ## Decor 2.0695 0.2298 9.006 5.42e-16 *** ## East 9.6616 6.2184 1.554 0.122 ## Decor:East -0.4346 0.3518 -1.235 0.219 ## --- ## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1 ## ## Residual standard error: 5.711 on 163 degrees of freedom ## Multiple R-squared: 0.6313, Adjusted R-squared: 0.6223 ## F-statistic: 69.78 on 4 and 163 DF, p-value: < 2.2e-16
East
term was significant in model 2, suggesting that there is a significant relationship between location and price.Decor
to vary with location, and that difference in slopes was also nonsignificant.Load in the LA homes data set and fit the following model:
\[ logprice \sim logsqft + bed + city \]
What appears to be the reference level for city
?
In the context of this problem, what is suggested by the sign of the coefficient for bed
? Do this make sense to you?
Calculate the vector \(\hat{\beta}\) using the matrix formulation of the least squares estimates (useful functions: cbind()
, rep()
, matrix()
, as.matrix()
, t()
, solve()
). Do they agree with the estimates that come out of lm()
?
See if you can plot your full model as geometric structures on a 3D scatterplot of the data.