# Time Series Forecasting with ARIMA

## Step 2 - Importing Packages and Loading Data

To begin working with our data, we will start up Jupyter Notebook:

 jupyter notebook


To create a new notebook file, select New > Python 3 from the top right pull-down menu:

This will open a notebook.

As is best practice, start by importing the libraries you will need at the top of your notebook:

import warnings
import itertools
import pandas as pd
import numpy as np
import statsmodels.api as sm
import matplotlib.pyplot as plt
plt.style.use('fivethirtyeight')


We have also defined a  matplotlib style of fivethirtyeight for our plots.

We'll be working with a dataset called "Atmospheric CO2 from Continuous Air Samples at Mauna Loa Observatory, Hawaii, U.S.A.," which collected CO2 samples from March 1958 to December 2001. We can bring in this data as follows:

data = sm.datasets.co2.load_pandas()
y = data.data


Let's preprocess our data a little bit before moving forward. Weekly data can be tricky to work with since it's a briefer amount of time, so let's use monthly averages instead. We'll make the conversion with the  resample function. For simplicity, we can also use the  fillna() function to ensure that we have no missing values in our time series.

# The 'MS' string groups the data in buckets by start of the month
y = y['co2'].resample('MS').mean()

# The term bfill means that we use the value before filling in missing values
y = y.fillna(y.bfill())

print(y)


Outputco2
1958-03-01  316.100000
1958-04-01  317.200000
1958-05-01  317.433333
...
2001-11-01  369.375000
2001-12-01  371.020000


Let's explore this time series e as a data visualization:

y.plot(figsize=(15, 6))
plt.show()


Some distinguishable patterns appear when we plot the data. The time series has an obvious seasonality pattern, as well as an overall increasing trend.

Now that we've converted and explored our data, let's move on to time series forecasting with ARIMA.