Here is an example that combines much of what has been introduced within the course using a very practical application. You should view this step as a culminating project for the first six units of this course. You should master the material in this project before moving on to the units on data mining.
Categorical Plots
Bar Charts
We can use sns.barplot()
to create bar charts.
By default Seaborn represents the mean of the data as the height of the bar and represents the dispersion of the data with a small grey line that crosses through the top of the bar. The top and bottom of that line represent the 95% confidence interval.
# create a view into the dataframe restricted to States that start with 'I' dfi = df.loc[df.loc[:, 'State'].str.startswith('I')] sns.barplot(x = "State", y = "Rates.Violent.All", data = dfi) plt.show()
To get a horizontal bar chart, we just flip the variable assigned to the axes.
sns.barplot(y = "State", x = "Rates.Violent.All", data = dfi) plt.show()
Grouped Bar Charts
By adding the hue
arguments, we can create grouped bar charts.
dfi = df.loc[df.loc[:, 'State'].str.startswith('I')] sns.barplot(x = "State", y = "Rates.Violent.All", hue= 'Decade', data = dfi) plt.show()
dfi = df.loc[df.loc[:, 'State'].str.startswith('I')] sns.barplot(x = "Decade", y = "Rates.Violent.All", hue= 'State', data = dfi) plt.show()
Count Plots
We can make plots of the frequency of categorical data using
sns.countplot()
.
Note that we are only supplying an argument for the category we wanted to be counted. Seaborn handles actually doing the count.
# countplot sns.countplot(x = "Region", data = df_stateinfo) plt.show()
We can make a horizontal version of the frequency count by specifying
the categorical data using the argument y
rather than x
.
# horizontal countplot sns.countplot(y = "Division", data = df_stateinfo) plt.show()