Line Plots

Up until now, we have been making line plots where there was only one y value for each x value in the line. For example, in 2030, there was one 1 bunny left. However, often when we make a line plot, the line is actually representing a summary of lots of data points. For example, let's say we were plotting weekly Broadway revenue. A line plot would then produce a single point for each week representing the average revenue of lots of Broadway productions. In a sense, Seaborn is acting a lot like pd.groupby() and it is automatically aggregating over groups of data to produce the plots. If we hand Seaborn data where there are many values for each point we want to plot and ask it to make a line plot, it will assume we want the mean plotted and automatically handle aggregating, calculating the mean, and then plotting the line.

In addition, Seaborn will add a band of color around the line. Let's see what that looks like:

sns.lineplot(x = "Decade",
             y = "Rates.Violent.All",
             data = df)
plt.show() # still need this to show the plot



In this dataset, we have data from different states for each decade. Seaborn represents the mean value with the darker blue line. The lighter blue line represents a statistic called the 95% confidence interval. The confidence interval depends on both the sample size and the variance of the data itself. We will talk more about this later. If you are interested.

We can add a second line to the plot by adding a second function call to our code.

Notice that the data in the plot above is the same as the data in the lower line in the plot below. If I showed you the one above and asked you to come to a conclusion about violent crime rates, you would likely come to a different conclusion than if I showed you the one below. The only difference between the two is the scaling of the y-axis. Property crime is much more common, so Seaborn changed the y-axis to accommodate the second set of data.

Also, notice that the confidence intervals are still there but may be difficult to see due to the scaling and the size of the figure display.

sns.lineplot(x = "Decade", y = "Rates.Violent.All", data = df)
sns.lineplot(x = "Decade", y = "Rates.Property.All", data = df)
plt.show()