Practice: Model Basics

While R makes the model fitting process extremely easy, several steps or implicit decisions go into it. For example, one may choose to keep or remove extreme observations (outliers) and select the optimization algorithm. This exercise demonstrates the effects of these decisions on the modeling outcomes.

Exercises

  1. One downside of the linear model is that it is sensitive to unusual values because the distance incorporates a squared term. Fit a linear model to the simulated data below, and visualise the results. Rerun a few times to generate different simulated datasets. What do you notice about the model?

    sim1a <- tibble(
      x = rep(1:10, each = 3),
      y = x * 1.5 + 6 + rt(length(x), df = 2)
    )
  2. One way to make linear models more robust is to use a different distance measure. For example, instead of root-mean-squared distance, you could use mean-absolute distance:

    measure_distance <- function(mod, data) {
      diff <- data$y - model1(mod, data)
      mean(abs(diff))
    }
  3. Use optim() to fit this model to the simulated data above and compare it to the linear model.

  4. One challenge with performing numerical optimisation is that it's only guaranteed to find one local optimum. What's the problem with optimising a three parameter model like this?

    model1 <- function(a, data) {
      a[1] + data$x * a[2] + a[3]
    }

Source: H. Wickham and G. Grolemund, https://r4ds.had.co.nz/model-basics.html
Creative Commons License This work is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 3.0 License.

Last modified: Sunday, November 13, 2022, 3:58 PM