Scatterplots in Base R

Here we introduce scatterplots in base R. The codes are simple, but you should also remember the options that make the plots more informative, like adding colors, legends, and error bars.

Continuous Data

Multiple Data Sets on One Plot

One common task is to plot multiple data sets on the same plot. In many situations, the way to do this is to create the initial plot and add additional information. For example, to plot bivariate data, the plot command is used to initialize and create the plot. The points command can then add additional data sets to the plot.

First, define a set of normally distributed random numbers and plot them. (This same data set is used throughout the examples below).

> x <- rnorm(10,sd=5,mean=20)
> y <- 2.5*x - 1.0 + rnorm(10,sd=9,mean=0)
> cor(x,y)
[1] 0.7400576
> plot(x,y,xlab="Independent",ylab="Dependent",main="Random Stuff")
> x1 <- runif(8,15,25)
> y1 <- 2.5*x1 - 1.0 + runif(8,-6,6)
> points(x1,y1,col=2)

Note that in the previous example, the color for the second data point set using the col option. You can try different numbers to see what colors are available. There are at least eight options for most installations from 1 to 8. Also, note that the points are plotted as circles in the example above. The symbol that is used can be changed using the pch option.

> x2 <- runif(8,15,25)
> y2 <- 2.5*x2 - 1.0 + runif(8,-6,6)
> points(x2,y2,col=3,pch=2)

Again, try different numbers to see the various options. Another helpful option is to add a legend. This can be done with the legend command. In order, the options for the command are the x and y coordinates on the plot to place the legend, followed by a list of labels to use. There are many other options, so use help(legend) to see more options. For example, a list of colors can be given with the col option, and a list of symbols can be given with the pch option.

> plot(x,y,xlab="Independent",ylab="Dependent",main="Random Stuff")
> points(x1,y1,col=2,pch=3)
> points(x2,y2,col=4,pch=5)
> legend(14,70,c("Original","one","two"),col=c(1,2,4),pch=c(1,3,5))


Figure 1. The three data sets are displayed on the same graph.

Another common task is to change the limits of the axes to change the size of the plotting area. This is achieved using the xlim and ylim options in the plot command. Both options take a vector of length two that have the minimum and maximum values.

> plot(x,y,xlab="Independent",ylab="Dependent",main="Random Stuff",xlim=c(0,30),ylim=c(0,100))
> points(x1,y1,col=2,pch=3)
> points(x2,y2,col=4,pch=5)
> legend(14,70,c("Original","one","two"),col=c(1,2,4),pch=c(1,3,5))