Vectors and Type Coercion

The type of your data in R can be changed. Sometimes some other function you apply automatically changes the type internally, while the data object you supplied remains unaffected. For example, if x is a character object, lm(y ~ x) will treat x as a factor; x will remain the type character in the R environment. In other cases, to count the total or proportion of certain instances using a logical vector LV, you can apply sum(LV) or mean(LV) knowing that the logical values TRUE and FALSE will be treated as 1 and 0 by these functions. Please pay attention to these coercion rules.

To better understand this behavior, let's meet another of the data structures: the vector.

my_vector <- vector(length = 3)
my_vector
[1] FALSE FALSE FALSE

A vector in R is essentially an ordered list of things, with the special condition that everything in the vector must be the same basic data type. If you don't choose the datatype, it'll default to logical; or, you can declare an empty vector of whatever type you like.

another_vector <- vector(mode='character', length=3)
another_vector
Output
[1] "" "" ""


You can check if something is a vector:

str(another_vector)
Output
 chr [1:3] "" "" ""

The somewhat cryptic output from this command indicates the basic data type found in this vector - in this case chr, character; an indication of the number of things in the vector - actually, the indexes of the vector, in this case [1:3]; and a few examples of what's actually in the vector - in this case empty character strings. If we similarly do

str(cats$weight)
Output
 num [1:3] 2.1 5 3.2


we see that cats$weight is a vector, too - the columns of data we load into R data.frames are all vectors, and that's the root of why R forces everything in a column to be the same basic data type.


Discussion 1

Why is R so opinionated about what we put in our columns of data? How does this help us?

By keeping everything in a column the same, we allow ourselves to make simple assumptions about our data; if you can interpret one entry in the column as a number, then you can interpret all of them as numbers, so we don't have to check every time. This consistency is what people mean when they talk about clean data; in the long run, strict consistency goes a long way to making our lives easier in R.

You can also make vectors with explicit contents with the combine function:

combine_vector <- c(2,6,3)
combine_vector
Output
[1] 2 6 3


Given what we've learned so far, what do you think the following will produce?

quiz_vector <- c(2,6,'3')

This is something called type coercion, and it is the source of many surprises and the reason why we need to be aware of the basic data types and how R will interpret them. When R encounters a mix of types (here numeric and character) to be combined into a single vector, it will force them all to be the same type. Consider:

coercion_vector <- c('a', TRUE)
coercion_vector
Output
[1] "a"    "TRUE"
another_coercion_vector <- c(0, TRUE)
another_coercion_vector
Output
[1] 0 1


The coercion rules go: logical -> integer -> numeric -> complex -> character, where -> can be read as are transformed into. You can try to force coercion against this flow using the as. functions:

character_vector_example <- c('0','2','4')
character_vector_example
Output
[1] "0" "2" "4"
character_coerced_to_numeric <- as.numeric(character_vector_example)
character_coerced_to_numeric
Output
[1] 0 2 4
numeric_coerced_to_logical <- as.logical(character_coerced_to_numeric)
numeric_coerced_to_logical
Output
[1] FALSE  TRUE  TRUE


As you can see, some surprising things can happen when R forces one basic data type into another! Nitty-gritty of type coercion aside, the point is: if your data doesn't look like what you thought it was going to look like, type coercion may well be to blame; make sure everything is the same type in your vectors and your columns of data.frames, or you will get nasty surprises!

But coercion can also be very useful! For example, in our cats data likes_string is numeric, but we know that the 1s and 0s actually represent TRUE and FALSE (a common way of representing them). We should use the logical datatype here, which has two states: TRUE or FALSE, which is exactly what our data represents. We can ‘coerce' this column to be logical by using the as.logical function:

cats$likes_string
Output
[1] 1 0 1
cats$likes_string <- as.logical(cats$likes_string)
cats$likes_string
Output
[1]  TRUE FALSE  TRUE


The combine function, c(), will also append things to an existing vector:

ab_vector <- c('a', 'b')
ab_vector
Output
[1] "a" "b"
combine_example <- c(ab_vector, 'SWC')
combine_example
Output
[1] "a"   "b"   "SWC"


You can also make series of numbers:

mySeries <- 1:10
mySeries
Output
 [1]  1  2  3  4  5  6  7  8  9 10
seq(10)
Output
 [1]  1  2  3  4  5  6  7  8  9 10
seq(1,10, by=0.1)
Output
 [1]  1.0  1.1  1.2  1.3  1.4  1.5  1.6  1.7  1.8  1.9  2.0  2.1  2.2  2.3  2.4
[16]  2.5  2.6  2.7  2.8  2.9  3.0  3.1  3.2  3.3  3.4  3.5  3.6  3.7  3.8  3.9
[31]  4.0  4.1  4.2  4.3  4.4  4.5  4.6  4.7  4.8  4.9  5.0  5.1  5.2  5.3  5.4
[46]  5.5  5.6  5.7  5.8  5.9  6.0  6.1  6.2  6.3  6.4  6.5  6.6  6.7  6.8  6.9
[61]  7.0  7.1  7.2  7.3  7.4  7.5  7.6  7.7  7.8  7.9  8.0  8.1  8.2  8.3  8.4
[76]  8.5  8.6  8.7  8.8  8.9  9.0  9.1  9.2  9.3  9.4  9.5  9.6  9.7  9.8  9.9
[91] 10.0


We can ask a few questions about vectors:

sequence_example <- seq(10)
head(sequence_example, n=2)
Output
[1] 1 2
tail(sequence_example, n=4)
Output
[1]  7  8  9 10
length(sequence_example)
Output
[1] 10
class(sequence_example)
Output
[1] "integer"
typeof(sequence_example)
Output
[1] "integer"


Finally, you can give names to elements in your vector:

my_example <- 5:8
names(my_example) <- c("a", "b", "c", "d")
my_example
Output
a b c d 
5 6 7 8 
names(my_example)
Output
[1] "a" "b" "c" "d"

Source: The Carpentries, https://swcarpentry.github.io/r-novice-gapminder/04-data-structures-part1/index.html#vectors-and-type-coercion
Creative Commons License This work is licensed under a Creative Commons Attribution 4.0 License.

Last modified: Monday, January 9, 2023, 2:54 PM