Data Visualization in Python
Univariate Plots
We will be introducing plotting and code from 3 modules: matplotlib
, seaborn
and pandas
. As we go forth, you may ask the question, which one should I learn? Chris Moffitt has the following advice.
A pathway to learning (Chris Moffit)
- Learn the basic matplotlib terminology, specifically what is a
Figure
and anAxes
. - Always use the object-oriented interface. Get in the habit of using it from the start of your analysis. (not really getting into this, but basically, don't use the Matlab form I'll show at the end if you don't have to)
- Start your visualizations with basic pandas plotting.
- Use seaborn for the more complex statistical visualizations.
- Use matplotlib to customize the pandas or seaborn visualization.
pandas
Histogram
mtcars.plot.hist(y = 'mpg'); plt.show() # mtcars.plot(y = 'mpg', kind = 'hist') #mtcars['mpg'].plot(kind = 'hist')
Bar plot
mtcars['cyl'].value_counts().plot.bar(); plt.show()
Density plot
mtcars['mpg'].plot( kind = 'density'); plt.show()
seaborn
Histogram
ax = sns.distplot(mtcars['mpg'], kde=False); plt.show()
Bar plot
sns.countplot(data = mtcars, x = 'cyl');
plt.show()
diamonds = pd.read_csv('data/diamonds.csv.gz')
ordered_colors = ['E','F','G','H','I','J']
sns.catplot(data = diamonds, x = 'color', kind = 'count', color = 'blue');
plt.show()
Density plot
sns.distplot(mtcars['mpg'], hist=False);
plt.show()