Here is an example that combines much of what has been introduced within the course using a very practical application. You should view this step as a culminating project for the first six units of this course. You should master the material in this project before moving on to the units on data mining.
Basic Figure Creation with Seaborn
To create a figure or graph, we are typically going to: * call some
specific function within Seaborn, such as sns.lineplot()
, to create
a plotting object * include arguments in the call that specify the data
to plot and any options that we need * write additional code to make
tweaks to the object, such as adding a title, legend, or axis labels. *
call plt.show()
to display the object
Displaying a plot will typically involve a few lines of code. The initial call will create an object that we can't see yet, but that knows what type of plot it is and what data it holds. We then add to the object and tweak things. Finally, when we want to see what we've built, we can tell the object to show itself.
This is a slightly different way of interacting with objects than you might be familiar with. It takes some getting used to. The thing to remember is that when you call the initial function, you are creating an object. You can then do things with that object, such as putting it in a list, assigning it to a variable, or displaying it.
Note: Usually, plot objects don't display themselves until you ask, but Colab (and other forms of Jupyter notebooks) try to be helpful and may display your objects even when you haven't asked. You might be tempted to rely on that convenience, but I'd like you not to get into that habit, as it can cause quite a bit of confusion later down the road.
Seaborn Patterns
Let's look at some code to get a sense of the overall pattern for making
plots with Seaborn. Note: Remember that sns
is the alias for the
Seaborn library.
# set figure size sns.set(rc={'figure.figsize':(10,8)}) # data for our plot yearlist = [1970, 1980, 1990, 2000, 2010, 2020, 2030] zombiecount = [0, 0, 0, 0, 10, 20, 50] # slowest zombie spread of all time # plot starts here sns.lineplot(x = yearlist, y = zombiecount) # called a line plot, lists as x and y plt.show() # displays the plot
Let's add a few more components to our figure.
# data for our plot yearlist = [1970, 1980, 1990, 2000, 2010, 2020, 2030] zombiecount = [0, 0, 0, 0, 10, 20, 50] # slowest zombie spread of all time # plot starts here sns.lineplot(x = yearlist, y = zombiecount) # called a line plot, lists as x and y plt.title('Slow Zombie Apocalypse Chart') # added a title plt.xlabel('Year') # added a x-axis label plt.ylabel('Zombies') # added a y-axis label plt.show() # displays the plot
Here we will plot two sets of data on the same figure. Notice we are
calling sns.lineplot()
twice, but we are not getting back two plots.
Instead, the second call adds to the existing object.
# data for our plot yearlist = [1970, 1980, 1990, 2000, 2010, 2020, 2030] zombiecount = [0, 0, 0, 0, 10, 20, 50] # slowest zombie spread of all time bunnycount = [45, 35, 25, 15, 10, 5, 1] # bunnies have a rough time # plot starts here sns.lineplot(x = yearlist, y = zombiecount, label = 'zombies') # called a line plot, lists as x and y sns.lineplot(x = yearlist, y = bunnycount, label = 'bunnies') # add a line to the existing plot plt.show() # displays the plot
Again, here is the general pattern:
-
Have the data in some form
-
Call a Seaborn function to plot the data and supply necessary arguments
-
Tweak or add to the plot with calls to
plt
(which is an alias for another library called matplotlib) -
Display the result
Make sure you can recognize how the code above accomplishes each step.
Seaborn Patterns with Pandas
If you have your data in a dataframe, you can hand specific series from your dataframe to Seaborn for plotting.
For example:
# create a zombie bunny dataframe # don't worry about how this works dfzb = pd.DataFrame(zip(yearlist, zombiecount, bunnycount)) dfzb.columns = ['year', 'zombies', 'bunnies'] # plot the data sns.lineplot(x = 'year', y = 'zombies', data = dfzb) # specifies series and dataframe plt.show()
Notice that Seaborn has taken the series names and used them as axis labels. We can override this with a subsequent call to plt
.
sns.lineplot(x = 'year', y = 'zombies', data = dfzb) # specify series and dataframe plt.ylabel('Zombies!!!!') plt.show()
We have options for supplying our data to Seaborn. We can feed it the
data directly, for example, by passing it into a series, or we use data =
to specify a dataframe, and then it will look for x and y in the column
names. Students commonly blend the two approaches and have difficulty
figuring out why it fails. Always check that you've specified a data
source if you are trying to plot a series from a dataframe.