## Time Series Forecasting with ARIMA

This tutorial demonstrates how to implement the models and forecasting discussed in this unit. Since we are using Google Colab, you can jump to Step 2 to begin this programming example. Upon completing this tutorial, you should be able to construct models, make forecasts and validate forecasts given a time series data set.

### Step 4 - Parameter Selection for the ARIMA Time Series Model

When looking to fit time series data with a seasonal ARIMA model, our first goal is to find the values of ` ARIMA(p,d,q)(P,D,Q)s`

that optimize a metric of interest. There are many guidelines and best
practices to achieve this goal, yet the correct parametrization of ARIMA
models can be a painstaking manual process that requires domain
expertise and time. Other statistical programming languages such as ` R`

provide automated ways to solve this issue,
but those have yet to be ported over to Python. In this section, we
will resolve this issue by writing Python code to programmatically
select the optimal parameter values for our ` ARIMA(p,d,q)(P,D,Q)s`

time series model.

We will use a "grid search" to iteratively explore different
combinations of parameters. For each combination of parameters, we fit a
new seasonal ARIMA model with the ` SARIMAX()`

function from the ` statsmodels`

module and assess its overall quality. Once we have explored the entire
landscape of parameters, our optimal set of parameters will be the one
that yields the best performance for our criteria of interest. Let's
begin by generating the various combination of parameters that we wish
to assess:

# Define the p, d and q parameters to take any value between 0 and 2 p = d = q = range(0, 2) # Generate all different combinations of p, q and q triplets pdq = list(itertools.product(p, d, q)) # Generate all different combinations of seasonal p, q and q triplets seasonal_pdq = [(x[0], x[1], x[2], 12) for x in list(itertools.product(p, d, q))] print('Examples of parameter combinations for Seasonal ARIMA...') print('SARIMAX: {} x {}'.format(pdq[1], seasonal_pdq[1])) print('SARIMAX: {} x {}'.format(pdq[1], seasonal_pdq[2])) print('SARIMAX: {} x {}'.format(pdq[2], seasonal_pdq[3])) print('SARIMAX: {} x {}'.format(pdq[2], seasonal_pdq[4]))

OutputExamples of parameter combinations for Seasonal ARIMA... SARIMAX: (0, 0, 1) x (0, 0, 1, 12) SARIMAX: (0, 0, 1) x (0, 1, 0, 12) SARIMAX: (0, 1, 0) x (0, 1, 1, 12) SARIMAX: (0, 1, 0) x (1, 0, 0, 12)

We can now use the triplets of parameters defined above to automate the process of training and evaluating ARIMA models on different combinations. In Statistics and Machine Learning, this process is known as grid search (or hyperparameter optimization) for model selection.

When evaluating and comparing statistical models fitted with
different parameters, each can be ranked against one another based on
how well it fits the data or its ability to accurately predict future
data points. We will use the ` AIC`

(Akaike Information Criterion) value, which is conveniently returned with ARIMA models fitted using ` statsmodels`

. The ` AIC`

measures how well a model fits the data while taking into account the
overall complexity of the model. A model that fits the data very well
while using lots of features will be assigned a larger AIC score than a
model that uses fewer features to achieve the same goodness of fit.
Therefore, we are interested in finding the model that yields the lowest
` AIC`

value.

The code chunk below iterates through combinations of parameters and uses the ` SARIMAX`

function from ` statsmodels`

to fit the corresponding Seasonal ARIMA model. Here, the ` order`

argument specifies the ` (p, d, q)`

parameters, while the ` seasonal_order`

argument specifies the ` (P, D, Q, S)`

seasonal component of the Seasonal ARIMA model. After fitting each ` SARIMAX()`

model, the code prints out its respective ` AIC`

score.

warnings.filterwarnings("ignore") # specify to ignore warning messages for param in pdq: for param_seasonal in seasonal_pdq: try: mod = sm.tsa.statespace.SARIMAX(y, order=param, seasonal_order=param_seasonal, enforce_stationarity=False, enforce_invertibility=False) results = mod.fit() print('ARIMA{}x{}12 - AIC:{}'.format(param, param_seasonal, results.aic)) except: continue

Because some parameter combinations may lead to numerical misspecifications, we explicitly disabled warning messages in order to avoid an overload of warning messages. These misspecifications can also lead to errors and throw an exception, so we make sure to catch these exceptions and ignore the parameter combinations that cause these issues.

The code above should yield the following results, this may take some time:

OutputSARIMAX(0, 0, 0)x(0, 0, 1, 12) - AIC:6787.3436240402125 SARIMAX(0, 0, 0)x(0, 1, 1, 12) - AIC:1596.711172764114 SARIMAX(0, 0, 0)x(1, 0, 0, 12) - AIC:1058.9388921320026 SARIMAX(0, 0, 0)x(1, 0, 1, 12) - AIC:1056.2878315690562 SARIMAX(0, 0, 0)x(1, 1, 0, 12) - AIC:1361.6578978064144 SARIMAX(0, 0, 0)x(1, 1, 1, 12) - AIC:1044.7647912940095 ... ... ... SARIMAX(1, 1, 1)x(1, 0, 0, 12) - AIC:576.8647112294245 SARIMAX(1, 1, 1)x(1, 0, 1, 12) - AIC:327.9049123596742 SARIMAX(1, 1, 1)x(1, 1, 0, 12) - AIC:444.12436865161305 SARIMAX(1, 1, 1)x(1, 1, 1, 12) - AIC:277.7801413828764

The output of our code suggests that ` SARIMAX(1, 1, 1)x(1, 1, 1, 12)`

yields the lowest ` AIC`

value of 277.78. We should therefore consider this to be the optimal option out of all the models we have considered.