# Time Series Forecasting with ARIMA

## Step 2 - Importing Packages and Loading Data

To begin working with our data, we will start up Jupyter Notebook:

jupyter notebook

To create a new notebook file, select **New** > **Python 3** from the top right pull-down menu:

This will open a notebook.

As is best practice, start by importing the libraries you will need at the top of your notebook:

import warnings import itertools import pandas as pd import numpy as np import statsmodels.api as sm import matplotlib.pyplot as plt plt.style.use('fivethirtyeight')

We have also defined a ` matplotlib`

style of fivethirtyeight for our plots.

We'll be working with a dataset called "Atmospheric CO2 from Continuous Air Samples at Mauna Loa Observatory, Hawaii, U.S.A.," which collected CO2 samples from March 1958 to December 2001. We can bring in this data as follows:

data = sm.datasets.co2.load_pandas() y = data.data

Let's
preprocess our data a little bit before moving forward. Weekly data can
be tricky to work with since it's a briefer amount of time, so let's
use monthly averages instead. We'll make the conversion with the ` resample`

function. For simplicity, we can also use the ` fillna()`

function to ensure that we have no missing values in our time series.

# The 'MS' string groups the data in buckets by start of the month y = y['co2'].resample('MS').mean() # The term bfill means that we use the value before filling in missing values y = y.fillna(y.bfill()) print(y)

Outputco2 1958-03-01 316.100000 1958-04-01 317.200000 1958-05-01 317.433333 ... 2001-11-01 369.375000 2001-12-01 371.020000

Let's explore this time series e as a data visualization:

y.plot(figsize=(15, 6)) plt.show()

Some distinguishable patterns appear when we plot the data. The time series has an obvious seasonality pattern, as well as an overall increasing trend.

Now that we've converted and explored our data, let's move on to time series forecasting with ARIMA.