Practice: Formulas
We often keep the intercept in the model even if it is not statistically significant because our main focus is usually on the effect of other variables expressed in their coefficients. However, there are cases when we need to remove the intercept to obtain the so-called "regression through the origin". Also, we might need to model the combined effect of two factors using the interaction term (for example, to model how light and water conditions affect plant growth). These exercises let you practice these cases and suggest you compare alternative models. It does not count toward your grade and is just practice!
Exercises
-
What happens if you repeat the analysis of
sim2
using a model without an intercept. What happens to the model equation? What happens to the predictions? -
Use
model_matrix()
to explore the equations generated for the models I fit tosim3
andsim4
. Why is*
a good shorthand for interaction? -
Using the basic principles, convert the formulas in the following two models into functions. (Hint: start by converting the categorical variable into 0-1 variables.)
mod1 <- lm(y ~ x1 + x2, data = sim3) mod2 <- lm(y ~ x1 * x2, data = sim3)
-
For
sim4
, which ofmod1
andmod2
is better? I thinkmod2
does a slightly better job at removing patterns, but it's pretty subtle. Can you come up with a plot to support my claim?
Source: H. Wickham and G. Grolemund, https://r4ds.had.co.nz/model-basics.html
This work is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 3.0 License.