Introduction to ggplot

This section introduces the ggplot2 graphics. You will see how different the syntax is from the base-R graphics. You can think of ggplot2 creating graphs by combining layers with the "+" sign. The default gray background of the ggplot is not as good for printed publications and can be replaced by adding a theme layer, for example, + theme_minimal()

Key Points

  • Use ggplot2 to create plots.

  • Think about graphics in layers: aesthetics, geometry, statistics, scale transformation, and grouping.

Transformations and statistics

ggplot2 also makes it easy to overlay statistical models over the data. To demonstrate we'll go back to our first example:

ggplot(data = gapminder, mapping = aes(x = gdpPercap, y = lifeExp)) +
geom_point()


Currently it's hard to see the relationship between the points due to some strong outliers in GDP per capita. We can change the scale of units on the x axis using the scale functions. These control the mapping between the data values and visual values of an aesthetic. We can also modify the transparency of the points, using the alpha function, which is especially helpful when you have a large amount of data which is very clustered.

ggplot(data = gapminder, mapping = aes(x = gdpPercap, y = lifeExp)) +
geom_point(alpha = 0.5) + scale_x_log10()


The scale_x_log10 function applied a transformation to the coordinate system of the plot, so that each multiple of 10 is evenly spaced from left to right. For example, a GDP per capita of 1,000 is the same horizontal distance away from a value of 10,000 as the 10,000 value is from 100,000. This helps to visualize the spread of the data along the x-axis.

Tip Reminder: Setting an aesthetic to a value instead of a mapping

Notice that we used geom_point(alpha = 0.5). As the previous tip mentioned, using a setting outside of the aes() function will cause this value to be used for all points, which is what we want in this case. But just like any other aesthetic setting, alpha can also be mapped to a variable in the data. For example, we can give a different transparency to each continent with geom_point(mapping = aes(alpha = continent)).

We can fit a simple relationship to the data by adding another layer, geom_smooth:

ggplot(data = gapminder, mapping = aes(x = gdpPercap, y = lifeExp)) +
geom_point(alpha = 0.5) + scale_x_log10() + geom_smooth(method="lm")

Output
`geom_smooth()` using formula 'y ~ x'


We can make the line thicker by setting the size aesthetic in the geom_smooth layer:

ggplot(data = gapminder, mapping = aes(x = gdpPercap, y = lifeExp)) +
geom_point(alpha = 0.5) + scale_x_log10() + geom_smooth(method="lm", size=1.5)
Output
`geom_smooth()` using formula 'y ~ x'


There are two ways an aesthetic can be specified. Here we set the size aesthetic by passing it as an argument to geom_smooth. Previously in the lesson we've used the aes function to define a mapping between data variables and their visual representation.