Histograms and Density Plots in base R

Here you will see more examples of how to build histograms in base R. Note that when the total counts for two or more samples are different, we can convert the vertical axis to density so the distributions can be easily compared on the same plot.

Problem

You want to make a histogram or density plot.

Solution

Some sample data: these two vectors contain 200 data points each:

set.seed(1234)
rating  <- rnorm(200)
head(rating)
#> [1] -1.2070657  0.2774292  1.0844412 -2.3456977  0.4291247  0.5060559

rating2 <- rnorm(200, mean=.8)
head(rating2)
#> [1] 1.2852268 1.4967688 0.9855139 1.5007335 1.1116810 1.5604624

When plotting multiple groups of data, some graphing routines require a data frame with one column for the grouping variable and one for the measure variable.

# Make a column to indicate which group each value is in
cond <- factor( rep(c("A","B"), each=200) )

data <- data.frame(cond, rating = c(rating,rating2))
head(data)
#>   cond     rating
#> 1    A -1.2070657
#> 2    A  0.2774292
#> 3    A  1.0844412
#> 4    A -2.3456977
#> 5    A  0.4291247
#> 6    A  0.5060559
# Histogram
hist(rating)

# Use 8 bins (this is only approximate - it places boundaries on nice round numbers)
# Make it light blue #CCCCFF
# Instead of showing count, make area sum to 1, (freq=FALSE)
hist(rating, breaks=8, col="#CCCCFF", freq=FALSE)

# Put breaks at every 0.6
boundaries <- seq(-3, 3.6, by=.6)
boundaries
#>  [1] -3.0 -2.4 -1.8 -1.2 -0.6  0.0  0.6  1.2  1.8  2.4  3.0  3.6

hist(rating, breaks=boundaries)


# Kernel density plot
plot(density(rating))


Multiple groups with kernel density plots.

plot.multi.dens <- function(s)
{
    junk.x = NULL
    junk.y = NULL
    for(i in 1:length(s)) {
        junk.x = c(junk.x, density(s[[i]])$x)
        junk.y = c(junk.y, density(s[[i]])$y)
    }
    xr <- range(junk.x)
    yr <- range(junk.y)
    plot(density(s[[1]]), xlim = xr, ylim = yr, main = "")
    for(i in 1:length(s)) {
        lines(density(s[[i]]), xlim = xr, ylim = yr, col = i)
    }
}

# the input of the following function MUST be a numeric list
plot.multi.dens( list(rating, rating2))


The sm package also includes a way of doing multiple density plots. The data must be in a data frame.

library(sm)
sm.density.compare(data$rating, data$cond)
# Add a legend (the color numbers start from 2 and go up)
legend("topright", levels(data$cond), fill=2+(0:nlevels(data$cond)))



Source: W. Chung, http://www.cookbook-r.com/Graphs/Histogram_and_density_plot/
Creative Commons License This work is licensed under a Creative Commons Attribution-ShareAlike 3.0 License.

Last modified: Monday, January 9, 2023, 3:55 PM