Data Visualization in Python

At this point in the course, it is time to begin connecting the dots and applying visualization to your knowledge of statistics. Work through these programming examples to round out your knowledge of seaborn as it is applied to univariate and bivariate plots.

Plotting in Python

Let's take a very quick tour before we get into the weeds. We'll use the mtcars dataset as an exemplar dataset that we can import using pandas

import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
sns.set_context('paper')
sns.set_style('white', {'font.family':'Futura', 'text.color':'1'})
mtcars = pd.read_csv('data/mtcars.csv')


Static plots

We will demonstrate plotting in what I'll call the matplotlib ecosystem. matplotlib is the venerable and powerful visualization package that was originally designed to emulate the Matlab plotting paradigm. It has since evolved and has become a bit more user-friendly. It is still quite granular and can facilitate a lot of custom plots once you become familiar with it. However, as a starting point, I think it's a bit much. We'll see a bit of what it can offer later.

We will consider two other options, which are built on top of matplotlib, but are much more accessible. These are pandas and seaborn. The two packages have some different approaches, but both wrap matplotlib in higher-level code and decent choices, so we don't need to get into the matplotlib trenches quite so much. We'll still call matplotlib in our code since both these packages need it for some fine-tuning. Both packages are also very much aligned to the DataFrame construct in pandas, which makes plotting a much more seamless experience.

mtcars.plot.scatter(x = 'hp', y = 'mpg');
plt.show()
# mtcars.plot(x = 'hp', y = 'mpg', kind = 'scatter');


sns.scatterplot(data = mtcars, x = 'hp', y = 'mpg');
plt.show() 


There are, of course, some other choices based on your background and preferences. For static plots, there are a couple of emulators of the popular R package ggplot2. These are plotnine and ggplot. plotnine seems a bit more developed and uses the ggplot2 semantics of aesthetics and layers, with almost identical code syntax.

You can install plotnine using conda:

conda install -c conda-forge plotnine  
from plotnine import *
(ggplot(mtcars) + 
  aes(x = 'hp', y = 'mpg') +
  geom_point())
<ggplot: (302395799)>



Dynamic or interactive plots

There are several Python packages that wrap around Javascript plotting libraries that are so popular in web-based graphics like D3 and Vega. Three that deserve mention are plotly, bokeh, and altair.

If you actually want to experience the interactivity of the plots, please use the "Live notebooks" link in Canvas to run these notebooks. Otherwise, you can download the notebooks from the GitHub site and run them on your own computer.

plotly is a Python package developed by the company Plot.ly to interface with their interactive Javascript library either locally or via their web service. Plot.ly also develops an R package to interface with their products as well. It provides an intuitive syntax and ease of use and is probably the more popular package for interactive graphics from both R and Python.

import plotly.express as px
fig = px.scatter(mtcars, x = 'hp', y = 'mpg')
fig.show()

bokeh is an interactive visualization package developed by Anaconda. It is quite powerful, but its code can be rather verbose and granular

from bokeh.plotting import figure, output_file
from bokeh.io import output_notebook, show
output_notebook()
p = figure()
p.xaxis.axis_label = 'Horsepower'
p.yaxis.axis_label = 'Miles per gallon'
p.circle(mtcars['hp'], mtcars['mpg'], size=10);

show(p)

altair that leverages ideas from Javascript plotting libraries and a distinctive code syntax that may appeal to some

import altair as alt

alt.Chart(mtcars).mark_point().encode(
    x='hp',
    y='mpg'
).interactive()

We won't focus on these dynamic packages in this workshop in the interest of time, but you can avail of several online resources for these.

Package Resources
plotly Fundamentals
bokeh Tutorial
altair Overview