Now that you've mastered the basic statistical, array, and spreadsheet data processing techniques, it is natural to want to plot, render, and visualize that data. In this unit, we will discuss visualization techniques beyond those introduced using matplotlib and pandas. When you finish this unit, you will be able to implement and visualize data plots applied within the field of data science.
The matplotlib module is convenient for visualizing data formatted using numpy arrays. pandas is similarly equipped for pandas dataframes. The seaborn module is also designed to work with pandas data. It is extremely powerful for rendering data using methods professional data scientists would find useful. While matplotlib provides basic plotting capabilities, seaborn can be used to construct bar charts, violin plots, heat maps, and more. These more advanced forms of visualization for statistical data sets can enable one to immediately draw inferences based on patterns discerned within the plots.
Completing this unit should take you approximately 4 hours.
seaborn is an advanced visualization module designed to work with pandas dataframes. Follow along with the programming examples for an introduction to seaborn's capabilities. Pay close attention to how it is applied in tandem with pandas. For instance, notice how the fillna method is used for data cleaning. Additionally, observe how powerful seaborn can be, for example, as scatter plots are created for all numeric variables within a dataframe using a single command.
In many Python visualization presentations, you will see an almost "stream of consciousness" movement between matplotlib, pandas, and seaborn. When presented in this way, it can get a little confusing when tutorials jump around from one module to another (as you gain expertise, you most likely will end up doing the same). In this course, we have made extra effort to decouple these modules for you to understand how they work individually. At this advanced stage, however, be prepared for some overlap between various modules when it comes to visualization techniques. Watch this tutorial to practice examples of how matplotlib and seaborn are applied for visualization.
Watch this tutorial to practice more examples of how pandas and seaborn are applied for visualization.
At this point in the course, it is time to begin connecting the dots and applying visualization to your knowledge of statistics. Work through these programming examples to round out your knowledge of seaborn as it is applied to univariate and bivariate plots.
A tool very often used for plotting the results of statistical experiments is the box plot. It provides a quick visual summary of the maximum, minimum, median, and percent quartiles. Practice these programming examples to apply various quantities previously introduced in the statistics unit.
There is no substitute for much programming practice when connecting statistics and visualization. Follow along with this tutorial to refine your programming skills and review scatter plots, bar plots, pairwise plots, histograms, and box plots.
With your knowledge of Python visualization, this video offers some food for thought. You should gauge your confidence level for developing and implementing code to analyze data science problems by watching the examples.
Here is more Python practice with a specific application that applies a suite of programming techniques and commands. At this point in the course, your goal is to assimilate the knowledge presented to begin making higher-level connections between the materials presented in the course units.
Here is an example that combines much of what has been introduced within the course using a very practical application. You should view this step as a culminating project for the first six units of this course. You should master the material in this project before moving on to the units on data mining.
Take this assessment to see how well you understood this unit.