Vectors and Simple Manipulations

This section introduces the basic operations on vectors, most of which are done element-wise. Please pay attention to the recycling of vectors (usually, recycling doesn't generate an error or a warning, so it is easy to miss if it was unintended), missing values (NA), and logical vectors often used for data subsetting.

Index vectors; selecting and modifying subsets of a data set

Subsets of the elements of a vector may be selected by appending to the name of the vector an index vector in square brackets. More generally any expression that evaluates to a vector may have subsets of its elements similarly selected by appending an index vector in square brackets immediately after the expression.

Such index vectors can be any of four distinct types.

  1. A logical vector. In this case the index vector is recycled to the same length as the vector from which elements are to be selected. Values corresponding to TRUE in the index vector are selected and those corresponding to FALSE are omitted. For example
    > y <- x[!is.na(x)]
    

    creates (or re-creates) an object y which will contain the non-missing values of x, in the same order. Note that if x has missing values, y will be shorter than x. Also

    > (x+1)[(!is.na(x)) & x>0] -> z
    

    creates an object z and places in it the values of the vector x+1 for which the corresponding value in x was both non-missing and positive.

  2. A vector of positive integral quantities. In this case the values in the index vector must lie in the set {1, 2, …, length(x)}. The corresponding elements of the vector are selected and concatenated, in that order, in the result. The index vector can be of any length and the result is of the same length as the index vector. For example x[6] is the sixth component of x and
    > x[1:10]
    

    selects the first 10 elements of x (assuming length(x) is not less than 10). Also

    > c("x","y")[rep(c(1,2,2,1), times=4)]
    

    (an admittedly unlikely thing to do) produces a character vector of length 16 consisting of "x", "y", "y", "x" repeated four times.

  3. A vector of negative integral quantities. Such an index vector specifies the values to be excluded rather than included. Thus
    > y <- x[-(1:5)]
    

    gives y all but the first five elements of x.

  4. A vector of character strings. This possibility only applies where an object has a names attribute to identify its components. In this case a sub-vector of the names vector may be used in the same way as the positive integral labels in item 2 further above.
    > fruit <- c(5, 10, 1, 20)
    > names(fruit) <- c("orange", "banana", "apple", "peach")
    > lunch <- fruit[c("apple","orange")]
    

    The advantage is that alphanumeric names are often easier to remember than numeric indices. This option is particularly useful in connection with data frames, as we shall see later.

An indexed expression can also appear on the receiving end of an assignment, in which case the assignment operation is performed only on those elements of the vector. The expression must be of the form vector[index_vector] as having an arbitrary expression in place of the vector name does not make much sense here.

For example

> x[is.na(x)] <- 0

replaces any missing values in x by zeros and

> y[y < 0] <- -y[y < 0]

has the same effect as

> y <- abs(y)