nyc <- read.csv("http://andrewpbray.github.io/data/nyc.csv")
dim(nyc)
## [1] 168 7
nyc[1:3,]
## Case Restaurant Price Food Decor Service East ## 1 1 Daniella Ristorante 43 22 18 20 0 ## 2 2 Tello's Ristorante 32 20 19 19 0 ## 3 3 Biricchino 34 21 13 18 0
Let's look at the relationship between price, food rating, and decor rating.
\[ Price \sim Food + Decor \]
nyc[1:3, ]
## Case Restaurant Price Food Decor Service East ## 1 1 Daniella Ristorante 43 22 18 20 0 ## 2 2 Tello's Ristorante 32 20 19 19 0 ## 3 3 Biricchino 34 21 13 18 0
m1 <- lm(Price ~ Food + Decor, data = nyc)
summary(m1)
## ## Call: ## lm(formula = Price ~ Food + Decor, data = nyc) ## ## Residuals: ## Min 1Q Median 3Q Max ## -14.945 -3.766 -0.153 3.701 18.757 ## ## Coefficients: ## Estimate Std. Error t value Pr(>|t|) ## (Intercept) -24.5002 4.7230 -5.187 6.19e-07 *** ## Food 1.6461 0.2615 6.294 2.68e-09 *** ## Decor 1.8820 0.1919 9.810 < 2e-16 *** ## --- ## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1 ## ## Residual standard error: 5.788 on 165 degrees of freedom ## Multiple R-squared: 0.6167, Adjusted R-squared: 0.6121 ## F-statistic: 132.7 on 2 and 165 DF, p-value: < 2.2e-16
The mean function is . . .
When you have two continuous predictors \(x_1\), \(x_2\), then your mean function is . . .
a plane
Does the price depend on where the restaurant is located in Manhattan?
\[ Price \sim Food + Decor + East \]
nyc[1:3, ]
## Case Restaurant Price Food Decor Service East ## 1 1 Daniella Ristorante 43 22 18 20 0 ## 2 2 Tello's Ristorante 32 20 19 19 0 ## 3 3 Biricchino 34 21 13 18 0
m2 <- lm(Price ~ Food + Decor + East, data = nyc) summary(m2)
## ## Call: ## lm(formula = Price ~ Food + Decor + East, data = nyc) ## ## Residuals: ## Min 1Q Median 3Q Max ## -14.0451 -3.8809 0.0389 3.3918 17.7557 ## ## Coefficients: ## Estimate Std. Error t value Pr(>|t|) ## (Intercept) -24.0269 4.6727 -5.142 7.67e-07 *** ## Food 1.5363 0.2632 5.838 2.76e-08 *** ## Decor 1.9094 0.1900 10.049 < 2e-16 *** ## East 2.0670 0.9318 2.218 0.0279 * ## --- ## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1 ## ## Residual standard error: 5.72 on 164 degrees of freedom ## Multiple R-squared: 0.6279, Adjusted R-squared: 0.6211 ## F-statistic: 92.24 on 3 and 164 DF, p-value: < 2.2e-16
m3 <- lm(Price ~ Food + Decor + East + Decor:East, data = nyc) summary(m3)
## ## Call: ## lm(formula = Price ~ Food + Decor + East + Decor:East, data = nyc) ## ## Residuals: ## Min 1Q Median 3Q Max ## -13.7855 -3.6649 0.3785 3.7292 17.6358 ## ## Coefficients: ## Estimate Std. Error t value Pr(>|t|) ## (Intercept) -29.3971 6.3770 -4.610 8.10e-06 *** ## Food 1.6634 0.2822 5.895 2.09e-08 *** ## Decor 2.0695 0.2298 9.006 5.42e-16 *** ## East 9.6616 6.2184 1.554 0.122 ## Decor:East -0.4346 0.3518 -1.235 0.219 ## --- ## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1 ## ## Residual standard error: 5.711 on 163 degrees of freedom ## Multiple R-squared: 0.6313, Adjusted R-squared: 0.6223 ## F-statistic: 69.78 on 4 and 163 DF, p-value: < 2.2e-16
East term was significant in model 2, suggesting that there is a significant relationship between location and price.Decor to vary with location, and that difference in slopes was also nonsignificant.Load in the LA homes data set and fit the following model:
\[ logprice \sim logsqft + bed + city \]
What appears to be the reference level for city?
In the context of this problem, what is suggested by the sign of the coefficient for bed? Do this make sense to you?
Calculate the vector \(\hat{\beta}\) using the matrix formulation of the least squares estimates (useful functions: cbind(), rep(), matrix(), as.matrix(), t(), solve()). Do they agree with the estimates that come out of lm()?
See if you can plot your full model as geometric structures on a 3D scatterplot of the data.