Time Series Forecasting with ARIMA
This tutorial demonstrates how to implement the models and forecasting discussed in this unit. Since we are using Google Colab, you can jump to Step 2 to begin this programming example. Upon completing this tutorial, you should be able to construct models, make forecasts and validate forecasts given a time series data set.
Step 5  Fitting an ARIMA Time Series Model
Using grid search, we have identified the set of parameters that produces the best fitting model to our time series data. We can proceed to analyze this particular model in more depth.
We'll start by plugging the optimal parameter values into a new SARIMAX
model:
mod = sm.tsa.statespace.SARIMAX(y, order=(1, 1, 1), seasonal_order=(1, 1, 1, 12), enforce_stationarity=False, enforce_invertibility=False) results = mod.fit() print(results.summary().tables[1])
Output============================================================================== coef std err z P>z [0.025 0.975]  ar.L1 0.3182 0.092 3.443 0.001 0.137 0.499 ma.L1 0.6255 0.077 8.165 0.000 0.776 0.475 ar.S.L12 0.0010 0.001 1.732 0.083 0.000 0.002 ma.S.L12 0.8769 0.026 33.811 0.000 0.928 0.826 sigma2 0.0972 0.004 22.634 0.000 0.089 0.106 ==============================================================================
The summary
attribute that results from the output of SARIMAX
returns a significant amount of information, but we'll focus our attention on the table of coefficients. The coef
column shows the weight (i.e. importance) of each feature and how each one impacts the time series. The P>z
column informs us of the significance of each feature weight. Here, each weight has a pvalue lower or close to 0.05
, so it is reasonable to retain all of them in our model.
When fitting seasonal ARIMA models (and any other models for that
matter), it is important to run model diagnostics to ensure that none of
the assumptions made by the model have been violated. The plot_diagnostics
object allows us to quickly generate model diagnostics and investigate for any unusual behavior.
results.plot_diagnostics(figsize=(15, 12)) plt.show()
Our primary concern is to ensure that the residuals of our model are uncorrelated and normally distributed with zero mean. If the seasonal ARIMA model does not satisfy these properties, it is a good indication that it can be further improved.
In this case, our model diagnostics suggest that the model residuals are normally distributed based on the following:

In the top right plot, we see that the red
KDE
line follows closely with theN(0,1)
line (whereN(0,1)
) is the standard notation for a normal distribution with mean0
and standard deviation of1
). This is a good indication that the residuals are normally distributed. 
The qqplot on the bottom left shows that the ordered distribution of residuals (blue dots) follows the linear trend of the samples taken from a standard normal distribution with
N(0, 1)
. Again, this is a strong indication that the residuals are normally distributed. 
The residuals over time (top left plot) don't display any obvious seasonality and appear to be white noise. This is confirmed by the autocorrelation (i.e. correlogram) plot on the bottom right, which shows that the time series residuals have a low correlation with lagged versions of themselves.
Those observations lead us to conclude that our model produces a satisfactory fit that could help us understand our time series data and forecast future values.
Although we have a satisfactory fit, some parameters of our seasonal ARIMA model could be changed to improve our model fit. For example, our grid search only considered a restricted set of parameter combinations, so we may find better models if we widen the grid search.