Factors

Factors are the way categorical variables are stored in R. For example, treatment levels in ANOVA (analysis of variance) are considered factors; months or quarters of the year can be represented as factors for modeling seasonality. You should learn how to create factors, rename and reorder factor levels for convenience, and correct analysis (for example, the control treatment usually should be the first level of a factor because, by default, other levels are compared to the first one in linear models).

Introduction

In R, factors are used to work with categorical variables, variables that have a fixed and known set of possible values. They are also useful when you want to display character vectors in a non-alphabetical order.

Historically, factors were much easier to work with than characters. As a result, many of the functions in base R automatically convert characters to factors. This means that factors often crop up in places that are not helpful. Fortunately, you don't need to worry about that in the tidyverse, and can focus on situations where factors are genuinely useful.


Prerequisites

To work with factors, we'll use the forcats package, which is part of the core tidyverse. It provides tools for dealing with categorical variables (and it's an anagram of factors!) using a wide range of helpers for working with factors.

 library (tidyverse)


Source: H. Wickham and G. Grolemund, https://r4ds.had.co.nz/factors.html
Creative Commons License This work is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 3.0 License.