ARIMA in Python

Use this tutorial to implement an ARIMA model and make forecasts. General reference is made to a data set, but you must obtain your own CSV file for actual data. A great source for data scientists is Kaggle. With your current expertise, you should be able to search for and download a .csv file with stock price data that is not too large (<50MB). Additionally, as illustrated in the tutorial, you can apply pandas to extract a column of data.

The ARIMA model allows us to forecast a time series using the series' past values.

A time series is a collection of data points collected at constant time intervals. Time series are used to forecast future values based on previous values.

A stationary time series is one whose statistical properties (mean, variance, autocorrelation, etc.) are all constant over time. A non-stationary series is one whose statistical properties change over time.


ARIMA model

An ARIMA model is characterized by three terms: p, d, and q.

  • p is the order of the AR part of the model term.
  • q is the order of the MA term.
  • d is the number of differencing required to make the time series stationary.

ARIMA model in python

In this example, we will predict the next 10 days of stock prices from a given data of 100 days.


Step 1

Import the relevant libraries to perform time series forecasting:
import numpy as np, pandas as pd
import statsmodels.tsa.stattools as ts
from statsmodels.tsa.arima_model import ARIMA
import matplotlib.pyplot as plt

Step 2
Upload the relevant dataset using pandas.read_csv() method:

file = pd.read_csv("data.csv")
// prices is a field in .csv file containing all stock prices.
stock_price = df['prices']

You can view this data in stock_price using the plt.plot() method:
plt.plot(stock_price) 

Below is the code to output the variation in stock price for the last 100 days. It also contains a .csv file with sample stock prices.


Step 3
Initialize the ARIMA model and set the values of p, d, and q as 1, 1, and 2.
model = ARIMA(stock_price, order=(1,1,2))
model_fit = model.fit(disp=0)
// summary provides a detailed summary of the time series model
print(model_fit.summary())


Step 4
Let's predict the next 10 values and plot them on a graph:
pred = model_fit.predict(100,109,typ='levels')
// 100-109, refers to the next 10 values after the value at 99th index.
newarr = []
for i in price:
  newarr.append(i)


for x in pred:
  newarr.append(x)

plt.plot(newarr)


graph


Complete code

main.py
import numpy as np, pandas as pd
import matplotlib.pyplot as plt
import statsmodels.tsa.stattools as ts
from statsmodels.tsa.arima_model import ARIMA

df = pd.read_csv("data.csv")
# prices is a field in .csv file containing all stock prices.
stock_price = df['prices']
plt.plot(stock_price)

# ARIMA model
model = ARIMA(stock_price, order=(1,1,2))
model_fit = model.fit(disp=0)
# summary provides a detailed summary of the time series model
print(model_fit.summary())

# Predicting values
pred = model_fit.predict(100,109,typ='levels')
# 100-109, refers to the next 10 values after the value at 99th index.

# newarr array combines the predicted the stock values in one array
newarr = []
for i in price:
  newarr.append(i)


for x in pred:
  newarr.append(x)

plt.plot(newarr)

data.csv

"prices"
88
84
85
85
84
85
83
85
88
89
91
99
104
112
126
138
146
151
150
148
147
149
143
132
131
139
147
150
148
145
140
134
131
131
129
126
126
132
137
140
142
150
159
167
170
171
172
172
174
175
172
172
174
174
169
165
156
142
131
121
112
104
102
99
99
95
88
84
84
87
89
88
85
86
89
91
91
94
101
110
121
135
145
149
156
165
171
175
177
182
193
204
208
210
215
222
228
226
222
220

Source: Sarvech Qadir, https://www.educative.io/answers/what-is-arima-in-python
Creative Commons License This work is licensed under a Creative Commons Attribution-ShareAlike 4.0 License.

Last modified: Wednesday, September 28, 2022, 3:34 PM