Histograms and Density Plots in ggplot2
Site: | Saylor Academy |
Course: | PRDV420: Introduction to R Programming |
Book: | Histograms and Density Plots in ggplot2 |
Printed by: | Guest user |
Date: | Sunday, May 19, 2024, 12:52 AM |
Description
Now you will learn the ggplot2 syntax for building and customizing histograms.
Introduction
This R tutorial describes how to create a histogram plot using R software and ggplot2 package.
The function geom_histogram() is used. You can also add a line for the mean using the function geom_vline.
Prepare the data
The data below will be used :
set.seed(1234) df <- data.frame( sex=factor(rep(c("F", "M"), each=200)), weight=round(c(rnorm(200, mean=55, sd=5), rnorm(200, mean=65, sd=5))) ) head(df)
## sex weight ## 1 F 49 ## 2 F 56 ## 3 F 60 ## 4 F 43 ## 5 F 57 ## 6 F 58
Source: STHDA, http://www.sthda.com/english/wiki/ggplot2-histogram-plot-quick-start-guide-r-software-and-data-visualization
This work is licensed under a Creative Commons Attribution-NonCommercial-ShareAlike 3.0 License.
Basic Histogram Plots
library(ggplot2) # Basic histogram ggplot(df, aes(x=weight)) + geom_histogram() # Change the width of bins ggplot(df, aes(x=weight)) + geom_histogram(binwidth=1) # Change colors p<-ggplot(df, aes(x=weight)) + geom_histogram(color="black", fill="white") p
Add mean line and density plot on the histogram
- The histogram is plotted with density instead of count on y-axis
- Overlay with transparent density plot. The value of alpha controls the level of transparency
# Add mean line p+ geom_vline(aes(xintercept=mean(weight)), color="blue", linetype="dashed", size=1) # Histogram with density plot ggplot(df, aes(x=weight)) + geom_histogram(aes(y=..density..), colour="black", fill="white")+ geom_density(alpha=.2, fill="#FF6666")
Change histogram plot line types and colors
# Change line color and fill color ggplot(df, aes(x=weight))+ geom_histogram(color="darkblue", fill="lightblue") # Change line type ggplot(df, aes(x=weight))+ geom_histogram(color="black", fill="lightblue", linetype="dashed")
Change histogram plot colors by groups:
The following sections describe:
- Calculating the mean of each group
- Changing line color
- Changing fill color
Calculate the mean of each group
The package plyr is used to calculate the average weight of each group :
library(plyr) mu <- ddply(df, "sex", summarise, grp.mean=mean(weight)) head(mu)
## sex grp.mean ## 1 F 54.70 ## 2 M 65.36
Change line colors
Histogram plot line colors can be automatically controlled by the levels of the variable sex.
Note that, you can change the position adjustment to use for overlapping points on the layer. Possible values for the argument position are "identity", "stack", "dodge". Default value is "stack".
# Change histogram plot line colors by groups ggplot(df, aes(x=weight, color=sex)) + geom_histogram(fill="white") # Overlaid histograms ggplot(df, aes(x=weight, color=sex)) + geom_histogram(fill="white", alpha=0.5, position="identity")
# Interleaved histograms ggplot(df, aes(x=weight, color=sex)) + geom_histogram(fill="white", position="dodge")+ theme(legend.position="top") # Add mean lines p<-ggplot(df, aes(x=weight, color=sex)) + geom_histogram(fill="white", position="dodge")+ geom_vline(data=mu, aes(xintercept=grp.mean, color=sex), linetype="dashed")+ theme(legend.position="top") p
It is also possible to change manually histogram plot line colors using the functions :
- scale_color_manual() : to use custom colors
- scale_color_brewer() : to use color palettes from RColorBrewer package
- scale_color_grey() : to use grey color palettes
# Use custom color palettes p+scale_color_manual(values=c("#999999", "#E69F00", "#56B4E9")) # Use brewer color palettes p+scale_color_brewer(palette="Dark2") # Use grey scale p + scale_color_grey() + theme_classic() + theme(legend.position="top")
Change fill colors
Histogram plot fill colors can be automatically controlled by the levels of sex :
# Change histogram plot fill colors by groups ggplot(df, aes(x=weight, fill=sex, color=sex)) + geom_histogram(position="identity") # Use semi-transparent fill p<-ggplot(df, aes(x=weight, fill=sex, color=sex)) + geom_histogram(position="identity", alpha=0.5) p # Add mean lines p+geom_vline(data=mu, aes(xintercept=grp.mean, color=sex), linetype="dashed")
It is also possible to change manually histogram plot fill colors using the functions :
- scale_fill_manual() : to use custom colors
- scale_fill_brewer() : to use color palettes from RColorBrewer package
- scale_fill_grey() : to use grey color palettes
# Use custom color palettes p+scale_color_manual(values=c("#999999", "#E69F00", "#56B4E9"))+ scale_fill_manual(values=c("#999999", "#E69F00", "#56B4E9")) # use brewer color palettes p+scale_color_brewer(palette="Dark2")+ scale_fill_brewer(palette="Dark2") # Use grey scale p + scale_color_grey()+scale_fill_grey() + theme_classic()
Change the legend position
p + theme(legend.position="top") p + theme(legend.position="bottom") # Remove legend p + theme(legend.position="none")
The allowed values for the arguments legend.position are : "left","top", "right", "bottom".
Use facets
Split the plot into multiple panels :
p<-ggplot(df, aes(x=weight))+ geom_histogram(color="black", fill="white")+ facet_grid(sex ~ .) p # Add mean lines p+geom_vline(data=mu, aes(xintercept=grp.mean, color="red"), linetype="dashed")
Customized histogram plots
# Basic histogram ggplot(df, aes(x=weight, fill=sex)) + geom_histogram(fill="white", color="black")+ geom_vline(aes(xintercept=mean(weight)), color="blue", linetype="dashed")+ labs(title="Weight histogram plot",x="Weight(kg)", y = "Count")+ theme_classic() # Change line colors by groups ggplot(df, aes(x=weight, color=sex, fill=sex)) + geom_histogram(position="identity", alpha=0.5)+ geom_vline(data=mu, aes(xintercept=grp.mean, color=sex), linetype="dashed")+ scale_color_manual(values=c("#999999", "#E69F00", "#56B4E9"))+ scale_fill_manual(values=c("#999999", "#E69F00", "#56B4E9"))+ labs(title="Weight histogram plot",x="Weight(kg)", y = "Count")+ theme_classic()
Combine histogram and density plots :
# Change line colors by groups ggplot(df, aes(x=weight, color=sex, fill=sex)) + geom_histogram(aes(y=..density..), position="identity", alpha=0.5)+ geom_density(alpha=0.6)+ geom_vline(data=mu, aes(xintercept=grp.mean, color=sex), linetype="dashed")+ scale_color_manual(values=c("#999999", "#E69F00", "#56B4E9"))+ scale_fill_manual(values=c("#999999", "#E69F00", "#56B4E9"))+ labs(title="Weight histogram plot",x="Weight(kg)", y = "Density")+ theme_classic()
Change line colors manually :
p<-ggplot(df, aes(x=weight, color=sex)) + geom_histogram(fill="white", position="dodge")+ geom_vline(data=mu, aes(xintercept=grp.mean, color=sex), linetype="dashed") # Continuous colors p + scale_color_brewer(palette="Paired") + theme_classic()+theme(legend.position="top") # Discrete colors p + scale_color_brewer(palette="Dark2") + theme_minimal()+theme_classic()+theme(legend.position="top") # Gradient colors p + scale_color_brewer(palette="Accent") + theme_minimal()+theme(legend.position="top")