Practice: Base-R Lists and Data Frames

We'll practice with the data frame format, which is the usual format for storing information on different variables. We'll practice the extensions of this format later. Use the cats data frame to solve these challenges

cats <- data.frame(coat = c("calico", "black", "tabby"),
                    weight = c(2.1, 5.0, 3.2),
                    likes_string = c(1, 0, 1))

Challenge 1

There are several subtly different ways to call variables, observations and elements from data.frames:

  • cats[1]
  • cats[[1]]
  • cats$coat
  • cats["coat"]
  • cats[1, 1]
  • cats[, 1]
  • cats[1, ]

Try out these examples and explain what is returned by each one.

Hint: Use the function typeof() to examine what is returned in each case.

Solution

cats[1]
Output
    coat
1 calico
2  black
3  tabby


We can think of a data frame as a list of vectors. The single brace [1] returns the first slice of the list, as another list. In this case it is the first column of the data frame.

cats[[1]]
Output
[1] calico black  tabby 
Levels: black calico tabby


The double brace [[1]] returns the contents of the list item. In this case it is the contents of the first column, a vector of type factor.

cats$coat
Output
[1] calico black  tabby 
Levels: black calico tabby


This example uses the $ character to address items by name. coat is the first column of the data frame, again a vector of type factor.

cats["coat"]
Output
    coat
1 calico
2  black
3  tabby


Here we are using a single brace ["coat"] replacing the index number with the column name. Like example 1, the returned object is a list.

cats[1, 1]
Output
[1] calico
Levels: black calico tabby


This example uses a single brace, but this time we provide row and column coordinates. The returned object is the value in row 1, column 1. The object is an integer but because it is part of a vector of type factor, R displays the label “calico” associated with the integer value.

cats[, 1]
Output
[1] calico black  tabby 
Levels: black calico tabby


Like the previous example we use single braces and provide row and column coordinates. The row coordinate is not specified, R interprets this missing value as all the elements in this column vector.

cats[1, ]
Output
    coat weight likes_string
1 calico    2.1            1


Again we use the single brace with row and column coordinates. The column coordinate is not specified. The return value is a list containing all the values in the first row.


Challenge 2

Create a list of length two containing a character vector for each of the sections in this part of the workshop:

  • Data types
  • Data structures

Populate each character vector with the names of the data types and data structures we've seen so far.

Solution


dataTypes <- c('double', 'complex', 'integer', 'character', 'logical')
dataStructures <- c('data.frame', 'vector', 'factor', 'list', 'matrix')
answer <- list(dataTypes, dataStructures)

Note: it's nice to make a list in big writing on the board or taped to the wall listing all of these types and structures - leave it up for the rest of the workshop to remind people of the importance of these basics.


Source: The Carpentries, https://swcarpentry.github.io/r-novice-gapminder/04-data-structures-part1/index.html#vectors-and-type-coercion
Creative Commons License This work is licensed under a Creative Commons Attribution 4.0 License.

Last modified: Wednesday, December 7, 2022, 9:48 PM