10.1: The statsmodels Module
Many Python modules that have statistical capabilities are not completely disjoint. You probably have noticed that there is some measure of overlap between scipy.stats, numpy, pandas, and sckit-learn (for example, scipy.stats can perform linear regression using the linregress method). This is to simplify the import process when making basic statistical calculations on arrays and dataframe data. On the other hand, there comes a point where major differences become obvious. This motivating example compares the functionality of the linregress method against the ols method from statsmodels. Follow this tutorial to see how the statsmodels module improves upon a module such as scipy.stats when building statistical models.
This example is similar to the previous but constructs a simple data set to easily digest the report results generated by statsmodels.
This tutorial is designed to help you jump from the scikit-learn module to statsmodels. Practice the code examples in order to thoroughly grasp the differences. The housing dataset USA_Housing.csv in this tutorial is available here or on the Kaggle website, as mentioned in the video. You can download this file to your local drive. If you are using Google Colab, you can use the instructions outlined in subunit 5.1 of this course for loading a local file.