Practice: Formulas

We often keep the intercept in the model even if it is not statistically significant because our main focus is usually on the effect of other variables expressed in their coefficients. However, there are cases when we need to remove the intercept to obtain the so-called "regression through the origin". Also, we might need to model the combined effect of two factors using the interaction term (for example, to model how light and water conditions affect plant growth). These exercises let you practice these cases and suggest you compare alternative models. It does not count toward your grade and is just practice!


  1. What happens if you repeat the analysis of sim2 using a model without an intercept. What happens to the model equation? What happens to the predictions?

  2. Use model_matrix() to explore the equations generated for the models I fit to sim3 and sim4. Why is * a good shorthand for interaction?

  3. Using the basic principles, convert the formulas in the following two models into functions. (Hint: start by converting the categorical variable into 0-1 variables.)

    mod1 <- lm(y ~ x1 + x2, data = sim3)
    mod2 <- lm(y ~ x1 * x2, data = sim3)
  4. For sim4, which of mod1 and mod2 is better? I think mod2 does a slightly better job at removing patterns, but it's pretty subtle. Can you come up with a plot to support my claim?

Source: H. Wickham and G. Grolemund,
Creative Commons License This work is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 3.0 License.

Last modified: Thursday, December 15, 2022, 4:58 PM