10.4: Autoregressive Integrated Moving Average (ARIMA) Models
The autoregressive integrated moving average (ARIMA) model is an approach for nonstationary time series. It applies a combination of AR and MA modeling to balance out time series variances that can occur within a stochastic process. Additionally, it is often possible to convert a nonstationary time series to stationary series by taking successive differences. The "I" in ARIMA stands for the number of differences needed to eliminate nonstationary behavior. Read this article to get an overview of the mathematical form of the ARIMA(p,d,q) approach to model building. Take note of how you can use various choices of the p, d, and q parameters to form AR, MA, ARMA, or ARIMA models.
Use this tutorial to implement an ARIMA model and make forecasts. General reference is made to a data set, but you must obtain your own CSV file for actual data. A great source for data scientists is Kaggle. With your current expertise, you should be able to search for and download a .csv file with stock price data that is not too large (<50MB). Additionally, as illustrated in the tutorial, you can apply pandas to extract a column of data.
This tutorial delves a bit deeper into statistical models. Study it to better understand the ARIMA and seasonal ARIMA models. Consider closely the discussion of how to apply the ACF and PACF to estimate the order parameters for a given model. In practical circumstances, this is an important question as it is often the case that such parameters would initially be unknown.
Here is a practical application of the ARIMA model. Although this tutorial makes brief references to the R language, you should use it to tie together the concepts (AR, MA, ACF, and PACF) presented in this unit.
This tutorial demonstrates how to implement the models and forecasting discussed in this unit. Since we are using Google Colab, you can jump to Step 2 to begin this programming example. Upon completing this tutorial, you should be able to construct models, make forecasts and validate forecasts given a time series data set.