ARIMA and Seasonal ARIMA Models
Site: | Saylor Academy |
Course: | CS250: Python for Data Science |
Book: | ARIMA and Seasonal ARIMA Models |
Printed by: | Guest user |
Date: | Friday, 4 April 2025, 5:50 PM |
Description
This tutorial delves a bit deeper into statistical models. Study it to better understand the ARIMA and seasonal ARIMA models. Consider closely the discussion of how to apply the ACF and PACF to estimate the order parameters for a given model. In practical circumstances, this is an important question as it is often the case that such parameters would initially be unknown.
Identifying and Estimating ARIMA Models; Using ARIMA Models to Forecast Future Values
This week, we'll learn some techniques for identifying and estimating non-seasonal ARIMA models. We'll also look at the basics of using an ARIMA model to make forecasts. We'll look at seasonal ARIMA models next week. Lesson 3.1 gives the basic ideas for determining a model and analyzing residuals after a model has been estimated. Lesson 3.2 gives a test for residual autocorrelations. Lesson 3.3 gives some basics for forecasting using ARIMA models. We'll look at other forecasting models later in the course. This all relates to Chapter 3 in the book, although the authors give quite a theoretical treatment of the topic(s).
Source: The Pennsylvania State University, https://online.stat.psu.edu/stat510 This work is licensed under a Creative Commons Attribution-NonCommercial 4.0 License.
Non-seasonal ARIMA Models
ARIMA models, also called Box-Jenkins models, are models that may possibly include autoregressive terms, moving average terms, and differencing operations. Various abbreviations are used:
- When a model only involves autoregressive terms, it may be referred to as an AR model. When a model only involves moving average terms, it may be referred to as an MA model.
- When no differencing is involved, the abbreviation ARMA may be used.
This week we're only considering non-seasonal models. We'll expand our toolkit to include seasonal models next week.
Specifying the Elements of the Model
In most software programs, the elements in the model are specified in the order (AR order, differencing, MA order). As examples,
A model with (only) two AR terms would be specified as an ARIMA of order (2,0,0).
A MA(2) model would be specified as an ARIMA of order (0,0,2).
A model with one AR term, a first difference, and one MA term would have order (1,1,1).
For the last model, ARIMA (1,1,1), a model with one AR term and one
MA term is being applied to the variable . A first difference might be used to account for a linear
trend in the data.
The differencing order refers to successive first differences. For
example, for a difference order = 2 the variable analyzed is , the first difference of first differences. This
type of difference might account for a quadratic trend in the data.
Identifying a Possible Model
Three items should be considered to determine the first guess at an ARIMA model: a time series plot of the data, the ACF, and the PACF.
-
Time series plot of the observed series
In Lesson 1.1, we discussed what to look for: possible trends, seasonality, outliers, constant variance, or nonconstant variance.
Note!
Nonconstant variance in a series with no trend may have to be addressed with something like an ARCH model, which includes a model for changing variation over time. We'll cover ARCH models later in the course.- You won't be able to spot any particular model by looking at this plot, but you will be able to see the need for various possible actions.
- If there's an obvious upward or downward linear trend, a first difference may be needed. A quadratic trend might need a 2nd order difference (as described above). We rarely want to go much beyond two. In those cases, we might want to think about things like smoothing, which we will cover later in the course. Over-differencing can cause us to introduce unnecessary levels of dependency (difference white noise to obtain an MA(1)–difference again to obtain an MA(2), etc.)
- For data with a curved upward trend accompanied by increasing
variance, you should consider transforming the series with either a
logarithm or a square root.
-
ACF and PACF
The ACF and PACF should be considered together. It can sometimes be tricky going, but a few combined patterns do stand out. Note that each pattern includes a discussion of both plots and so you should always describe how both plots suggest a model. (These are listed in Table 3.1 of the book in Section 3.3).
- AR models have theoretical PACFs with non-zero values at the AR
terms in the model and zero values elsewhere. The ACF will taper to
zero in some fashion.
- AR models have theoretical PACFs with non-zero values at the AR
terms in the model and zero values elsewhere. The ACF will taper to
zero in some fashion.
- An AR(2) has two spikes in the PACF and a sinusoidal ACF that converges to 0.
- MA models have theoretical ACFs with non-zero values at the MA
terms in the model and zero values elsewhere. The PACF will taper to
zero in some fashion.
- ARMA models (including both AR and MA terms) have ACFs and PACFs
that both tail off to 0. These are the trickiest because the order will
not be particularly obvious. Basically, you just have to guess that one
or two terms of each type may be needed and then see what happens when
you estimate the model.
-
- If the ACF and PACF do not tail off but instead have values that stay close to 1 over many lags, the series is non-stationary, and differencing will be needed. Try a first difference and then look at the ACF and PACF of the differenced data.
- If the series autocorrelations are non-significant, then the series is random (white noise; the ordering matters, but the data are independent and identically distributed.) You're done at that point.
- If first differences were necessary and all the differenced
autocorrelations are non-significant, then the original series is called
a random walk, and you are done. (A possible model for a random walk is
. The data are dependent and are not identically distributed; in fact, both the mean and variance are increasing through time).
Estimating and Diagnosing a Possible Model
After you've made a guess (or two) at a possible model, use software such as R, Minitab, or SAS to estimate the coefficients. Most software will use maximum likelihood estimation methods to make the estimates. Once the model has been estimated, do the following.
- Look at the significance of the coefficients. In R, sarima provides
p-values and so you may simply compare the p-value to the standard 0.05
cut-off. The arima command does not provide p-values, and so you can
calculate a t-statistic:
= estimated coeff. / std. error of coeff. Recall that
is the Student
-value with area
to the right of
on df degrees of freedom. If
, then the estimated coefficient is significantly different from 0. When n is large, you may compare estimated coeff. / std. error of coeff to 1.96.
- Look at the ACF of the residuals. For a good model, all autocorrelations for the residual series should be non-significant. If this isn't the case, you need to try a different model.
- Look at Box-Pierce (Ljung) tests for possible residual autocorrelation at various lags (see Lesson 3.2 for a description of this test).
- If non-constant variance is a concern, look at a plot of residuals versus fits and/or a time series plot of the residuals.
If something looks wrong, you'll have to revise your guess at what
the model might be. This might involve adding parameters or
re-interpreting the original ACF and PACF to possibly move in a
different direction.
What if More Than One Model Looks Okay?
Sometimes more than one model can seem to work for the same dataset. When that's the case, some things you can do to decide between the models are:
- Possibly choose the model with the fewest parameters.
- Examine standard errors of forecast values. Pick the model with the generally lowest standard errors for predictions of the future.
- Compare models with regard to statistics such as the MSE (the estimate of the variance of the wt), AIC, AICc, and SIC (also called BIC). Lower values of these statistics are desirable.
AIC, AICc, and SIC (or BIC) are defined and discussed in Section 2.1 of our text. The statistics combine the estimate of the variance with values of the sample size and number of parameters in the model.
One reason that two models may seem to give about the same results is that, with the certain coefficient values, two different models can sometimes be nearly equivalent when they are each converted to an infinite order MA model. [Every ARIMA model can be converted to an infinite order MA – this is useful for some theoretical work, including the determination of standard errors for forecast errors].
Example 3-1: Lake Erie
The Lake Erie data (eriedata.dat from Week 1 assignment. The series is n = 40 consecutive annual measurements of the level of Lake Erie in October.
Identifying the model.
A time series plot of the data is the following:
There's a possibility of some overall trend, but it might look that way just because there seemed to be a big dip around the 15th time or so. We'll go ahead without worrying about trend.
The ACF and the PACF of the series are the following. (They start at lag 1).
The PACF shows a single spike at the first lag, and the ACF shows a tapering pattern. An AR(1) model is indicated.
Estimating the Model
We used an R script written by one of the authors of our book (Stoffer) to estimate the AR(1) model. Here's part of the output:
Coefficients:
ar1 | xmean | |
---|---|---|
Estimate | 0.6909 | 14.6309 |
Std. Error | 0.1094 | 0.5840 |
sigma^2 estimated as 1.447: log likelihood = -64.47,
\$AIC
[1] 3.373462
\$AICc
[1] 3.38157
\$BIC
[1] 3.500128
Where the coefficients are listed, notice the heading "xmean." This
is giving the estimated mean of the series based on this model, not the
intercept. The model used in the software is of the form .
The estimated model can be written as (xt - 14.6309) = 0.6909(xt-1 - 14.6309) + wt.
This is equivalent to xt = 14.6309 - (14.6309*0.6909) + 0.6909xt-1 + wt = 4.522 + 0.6909xt-1 + wt.
The AR coefficient is statistically significant (z = 0.6909/0.1094 = 6.315). It's not necessary to test the mean coefficient. We know that it's not 0.
The author's routine also gives residual diagnostics in the form of several graphs. Here's that part of the output:
Interpretations of the Diagnostics
The time series plot of the standardized residuals mostly indicates that there's no trend in the residuals, no outliers, and in general, no changing variance across time.
The ACF of the residuals shows no significant autocorrelations – a good result.
The Q-Q plot is a normal probability plot. It doesn't look too bad, so the assumption of normally distributed residuals looks okay.
The bottom plot gives p-values for the Ljung-Box-Pierce statistics for each lag up to 20. These statistics consider the accumulated residual autocorrelation from lag 1 up to and including the lag on the horizontal axis. The dashed blue line is at .05. All p-values are above it. That's a good result. We want non-significant values for this statistic when looking at residuals. Read Lesson 3.2 of this week for more about the Ljung–Box-Pierce statistic.
All in all, the fit looks good. There's not much need to continue, but just to show you how things looks when incorrect models are used, we will present another model.
Output for a Wrong Model
Suppose that we had misinterpreted the ACF and PACF of the data and had tried an MA(1) model rather than the AR(1) model.
Here's the output:
Coefficients:
ma1 | xmean | |
---|---|---|
Estimate | 0.5570 | 14.5881 |
Std. Error | 0.1251 | 0.3337 |
sigma^2 estimated as 1.870: log likelihood = -69.46
\$AIC
[1] 3.622905
\$AICc
[1] 3.631013
\$BIC
[1] 3.74957
The MA(1) coefficient is significant (you can check it), but mostly this looks worse than the statistics for the right model. The estimate of the variance is 1.87, compared to 1.447 for the AR(1) model. The AIC and BIC statistics are higher for the MA(1) than for the AR(1). That's not good.
The diagnostic graphs aren't good for the MA(1). The ACF has a significant spike at lag 2, and several of the Ljung-Box-Pierce p-values are below .05. We don't want them there. So, the MA(1) isn't a good model.
A Model with One Too Many Coefficients:
Suppose we try a model (still the Lake Erie Data) with one AR term and one MA term. Here's some of the output:
ar1 | ma1 | xmean | |
---|---|---|---|
Estimate | 0.7362 | -0.0909 | 14.6307 |
Std. Error | 0.1362 | 0.1969 | 0.6142 |
sigma^2 estimated as 1.439: log likelihood = -64.36,
\$AIC
[1] 3.418224
\$AICc
[1] 3.434891
\$BIC
[1] 3.587112
The MA(1) coefficient is not significant (z = -0.0909/.1969=-0.4617 is less than 1.96 in absolute value). The MA(1) term could be dropped so that takes us back to the AR(1). Also, the estimate of the variance is barely better than the estimate for the AR(1) model, and the AIC and BIC statistics are higher for the ARMA(1,1) than for the AR(1).
Example 3-2: Parameter Redundancy
Suppose
that the model for your data is white noise. If this is true for every
, then it is true for
as well, in other words:
Let's multiply both sides of the second equation by 0.5:
Next, we will move both terms over to one side:
Because the data is white noise, , so we can add
to the left side and
to the right side:
This is an ARMA(1, 1)! The problem is that we know it is white noise because of the original equations. If we looked at the ACF, what would we see? You would see the ACF corresponding to white noise, a spike at zero, and then nothing else. This also means if we take the white noise process and you try to fit in an ARMA(1, 1), R will do it and will come up with coefficients that look something like what we have above. This is one of the reasons why we need to look at the ACF and the PACF plots, and other diagnostics. We prefer a model with the fewest parameters. This example also says that for certain parameter values, ARMA models can appear very similar to one another.
R Code for Example 3-1
Here's how we accomplished the work for the example in this lesson.
We first loaded the astsa library discussed in Lesson 1. It's a set of scripts written by Stoffer, one of the textbook's authors. If you installed the astsa package during Lesson 1, then you only need the library command.
Use the command library("astsa"). This makes the downloaded routines accessible.
The session for creating Example 1 then proceeds as follows:
xerie = scan("eriedata.dat") #reads the data
xerie = ts (xerie) # makes sure xerie is a time series object
plot (xerie, type = "b") # plots xerie
acf2 (xerie) # author's routine for graphing both the ACF and the PACF
sarima (xerie, 1, 0, 0) # this is the AR(1) model estimated with the author's routine
sarima (xerie, 0, 0, 1) # this is the incorrect MA(1) model
sarima (xerie, 1, 0, 1) # this is the over-parameterized ARMA(1,1) model
We'll discuss the use of ARIMA models for forecasting. Here's how you would forecast for the next 4 times past the end of the series using the author's source code and the AR(1) model for the Lake Erie data.
sarima.for(xerie, 4, 1, 0, 0) # four forecasts from an AR(1) model for the erie data
You'll get forecasts for the next four times, the standard errors for these forecasts, and a graph of the time series along with the forecasts.
Diagnostics
Analyzing possible statistical significance of autocorrelation values
The
Ljung-Box statistic, also called the modified Box-Pierce statistic, is a
function of the accumulated sample autocorrelations, rj, up to any specified time lag . As a function of
, it is determined as:
where n = number of usable data points after any differencing operations. (Please visit forvo for the proper pronunciation of Ljung).
As an example,
Use of the Statistic
This statistic can be used to examine residuals from a time series model in order to see if all underlying population autocorrelations for the errors may be 0 (up to a specified point).
For nearly all models that we consider in this course, the residuals
are assumed to be "white noise," meaning that they are identically,
independently distributed (from each other). Thus, as we saw last week,
the ideal ACF for residuals is that all autocorrelations are 0. This
means that should be 0 for any lag
. A significant
for residuals indicates a possible problem with the model.
(Remember measures accumulated autocorrelation up to lag
).
Distribution of 
There are two cases:
- When the
are sample autocorrelations for residuals from a time series model, the null hypothesis distribution of
is approximately a
distribution with df =
, where
is the number of coefficients in the model, including a constant.
- When no model has been used, so that the ACF is for raw data, the
null distribution of
is approximately a
distribution with df =
.
p-Value Determination
In both cases, a p-value is calculated as the probability past
in the relevant distribution. A small p-value (for instance,
p-value < .05) indicates the possibility of non-zero autocorrelation
within the first
lags.
Example 3-3
Below there is Minitab output for the Lake Erie level data that was used for homework 1 and in Lesson 3.1. A useful model is an AR(1) with a constant. So .
Final Estimates of Parameters
Type | Coef | SE Coef | T | P |
---|---|---|---|---|
AR 1 | 0.7078 | 0.1161 | 6.10 | 0.000 |
Constant | 4.2761 | 0.1953 | 21.89 | 0.000 |
Mean | 14.6349 | 0.6684 |
Modified Box-Pierce (Ljung-Box) Chi-Square statistic
Lag | 12 | 24 | 36 | 48 |
---|---|---|---|---|
Chi-Square | 9.4 | 23.2 | 30.0 | * |
DF | 10 | 22 | 34 | * |
P-Value | 0.493 | 0.390 | 0.662 | * |
Minitab gives p-values for accumulated lags that are multiples of 12. The R sarima command will give a graph that shows p-values of the Ljung-Box-Pierce tests for each lag (in steps of 1) up to a certain lag, usually up to lag 20 for nonseasonal models.
Interpretation of the Box-Pierce Results
Notice that the p-values for the modified Box-Pierce all are well
above .05, indicating "non-significance". This is a desirable result.
Remember that there are only 40 data values, so there's not much data
contributing to correlations at high lags. Thus, the results for =
24 and
= 36 may not be meaningful.
Graphs of ACF values
When
you request a graph of the ACF values, "significance" limits are shown
by R and by Minitab. In general, the limits for the autocorrelation are
placed at standard errors of
. The formula used for
standard error depends upon the situation.
- Within the ACF of residuals as part of the ARIMA routine, the
standard errors are determined, assuming the residuals are white noise.
The approximate formula for any lag is that s.e. of
.
- For the ACF of raw data (the ACF command), the standard error at a lag k is found as if the right model was an MA(k-1). This allows the possible interpretation that if all autocorrelations past a certain lag are within limits, the model might be an MA of order defined by the last significant autocorrelation.
Appendix: Standardized Residuals
What
are standardized residuals in a time series framework? One of the
things that we need to look at when we look at the diagnostics from a
regression fit is a graph of the standardized residuals. Let's review
what this is for regular regression where the standard deviation is
. The standardized residual at observation i
should be N(0, 1). We hope to see normality when we look at the diagnostic plots. Another way to think about this is:
Now, with time series, things are very similar:
where
This is where the standardized residuals come from. This is also essentially how a time series is fit using R. We want to minimize the sums of these squared values:
(In reality, it is slightly more complicated. The log-likelihood function is minimized, and this is one term of that function).
Forecasting with ARIMA Models
Section 3.4 in the textbook gives a theoretical look at forecasting with ARIMA models. That presentation is a bit tough, but in practice, it's easy to understand how forecasts are created.
In an ARIMA model, we express as a function of past value(s) of x
and/or past errors (as well as a present time error). When we forecast a
value past the end of the series, we might need values from the
observed series on the right side of the equation, or we might, in
theory, need values that aren't yet observed.
Example 3-4
Consider
the AR(2) model . In this
model,
is a linear function of the values of x at the previous two times. Suppose that we have observed n
data values and wish to use the observed data and estimated AR(2) model
to forecast the value of
and
, the values of the
series at the next two times past the end of the series. The equations
for these two values are
To use the first of these equations, we simply use the observed
values of and
and replace
by its
expected value of 0 (the assumed mean for the errors).
The second equation for forecasting the value at time n + 2
presents a problem. It requires the unobserved value of (one
time past the end of the series). The solution is to use the forecasted
value of
(the result of the first equation).
In general, the forecasting procedure, assuming a sample size of n, is as follows:
Notation
Our authors use the notation to represent a forecast m times past the end of the observed series. The "superscript" is to be read as "given data up to time n". Other authors use the notation
to denote a forecast m times past time n.
To understand the formula for the standard error of the forecast error, we first need to define the concept of psi-weights.
Psi-weight representation of an ARIMA model
Any ARIMA model can be converted to an infinite order MA model:
\begin{align} x_t - \mu & = w_t + \Psi_1w_{t-1} + \Psi_2w_{t-2} + \dots + \Psi_kw_{t-k} + \dots \\ & = \sum_{j=0}^{\infty}\Psi_jw_{t-j} \text{ where} \Psi_0 \equiv 1 \end{align}
An important constraint so that the model doesn't "explode" as we go back in time is
The process of finding the "psi-weight" representation can involve a few algebraic tricks. Fortunately, R has a routine, ARMAtoMA, that will do it for us. To illustrate how psi-weights may be determined algebraically, we'll consider a simple example.
Example 3-5
Suppose that an AR(1) model is
For an AR(1) model, the mean so in this
case,
. We'll define
and rewrite the model as
. (You can do the algebra to check that things match between the two
expressions of the model.)
To find the psi-weight expression, we'll continually substitute for the z on the right side in order to make the expression become one that only involves w values.
Substitute the right side of the second expression for in the first expression.
This gives
Substituting this into the equation gives
If you keep going, you'll soon see that the pattern leads to
Thus the psi-weights for this model are given by for
.
In R, the command ARMAtoMA(ar = .6, ma=0, 12) gives the first 12 psi-weights. This will give the psi-weights to
in scientific notation.
For the AR(1) with AR coefficient = 0.6, they are:
[1] 0.600000000 0.360000000 0.216000000 0.129600000 0.077760000 0.046656000
[7] 0.027993600 0.016796160 0.010077696 0.006046618 0.003627971 0.002176782
Remember that . R doesn't give this value. Its
listing starts with
, which equals 0.6 in this case.
MA Models: The psi-weights are easy for an MA model
because the model is already written in terms of errors. The
psi-weights = 0 for lags past the order of the MA model and equal the
coefficient values for lags of the errors that are in the model.
Remember that we always have .
Standard error of the forecast error for a forecast using an ARIMA model
Without proof, we'll state a result:
The variance of the difference between the forecasted value at time n + m and the (unobserved) value at time n + m is
Thus the estimated standard deviation of the forecast error at time n + m is
The summation of squared psi-weights begins with
When forecasting m = 1 time past the end of the series, the standard error of the forecast error is
When forecasting the value m = 2 times past the end of the series, the standard error of the forecast error is
Notice that the variance will not be too big when m = 1. But, as you predict out farther in the future, the variance will increase. When m is very large, we will get the total variance. In other words, if you are trying to predict very far out, we will get the variance of the entire time series, as if you haven't even looked at what was going on previously.
With the assumption of normally distributed errors, a 95% prediction
interval for , the future value of the series at time n + m, is
Example 3-6
Suppose
that an AR(1) model is estimated to be . This is the same model used earlier in this handout,
so the psi-weights we got there apply.
Suppose that we have n = 100 observations,
and
. We wish to forecast
the values at times 101 and 102 and create prediction intervals for
both forecasts.
First, we forecast time 101.
The standard error of the forecast error at time 101 is
The 95% prediction interval for the value at time 101 is 88 ± 2(1.96), which is 84.08 to 91.96. We are, therefore, 95% confident that the observation at time 101 will be between 84.08 and 91.96. If we repeated this exact process many times, then 95% of the computed prediction intervals would contain the true value of x at time 101.
The forecast for time 102 is
We used the forecasted value for time 101 in the AR(1) equation.
The relevant standard error is
A 95% prediction interval for the value at time 102 is 92.8 ± (1.96)(2.332).
To forecast using an ARIMA model in R, we recommend our textbook author's script called sarima.for. (It is part of the astsa library recommended previously).
Example 3-7
In the homework for Lesson 2, problem 5 asked you to suggest a model for a time series of stride lengths measured every 30 seconds for a runner on a treadmill.
From R, the estimated coefficients for an AR(2) model and the estimated variance are as follows for a similar data set with n = 90 observations:
Coefficients:
ar1 | ar2 | xmean | |
---|---|---|---|
1.1480 | -0.3359 | 48.7476 | |
s.e. | 0.1009 | 0.1087 | 1.8855 |
sigma^2 estimated as 11.47
The command
sarima.for(stridelength, 6, 2, 0, 0) # 6 forecasts with an AR(2) model for stridelength
will give forecasts and standard errors of prediction errors for the next six times past the end of the series. Here's the output (slightly edited to fit here):
\$pred
Time Series:
Start = 91
End = 96
[1] 69.78674 64.75441 60.05661 56.35385 53.68102 51.85633
\$se
Time Series:
Start = 91
End = 96
[1] 3.386615 5.155988 6.135493 6.629810 6.861170 6.962654
The forecasts are given in the first batch of values under \$pred, and the standard errors of the forecast errors are given in the last line in the batch of results under \$se.
The procedure also gave this graph, which shows the series followed by the forecasts as a red line and the upper and lower prediction limits as blue dashed lines:
Psi-Weights for the Estimated AR(2) for the Stride Length Data
If we wanted to verify the standard error calculations for the six forecasts past the end of the series, we would need to know the psi-weights. To get them, we need to supply the estimated AR coefficients for the AR(2) model to the ARMAtoMA command.
The R command in this case is ARMAtoMA(ar = list(1.148, -0.3359), ma = 0, 5)
This will give the psi-weights to
in scientific notation. The answer provided by R is:
[1] 1.148000e+00 9.820040e-01 7.417274e-01 5.216479e-01 3.497056e-01
The output for estimating the AR(2) included this estimate of the error variance:
sigma^2 estimated as 11.47
As an example, the standard error of the forecast error for 3 times past the end of the series is
which, except for the round-off error, matches the value of 6.135493 given as the third standard error in the sarima.for output above.
Where will the Forecasts End Up?
For a stationary series and model, the forecasts of future values will eventually converge to the mean and then stay there. Note below what happened with the stride length forecasts when we asked for 30 forecasts past the end of the series. [Command was sarima.for (stridelength, 30, 2, 0, 0)]. The forecast got to 48.74753 and then stayed there.
\$pred
Time Series:
Start = 91
End = 120
[1] 69.78674 64.75441 60.05661 56.35385 53.68102 51.85633 50.65935 49.89811
[9] 49.42626 49.14026 48.97043 48.87153 48.81503 48.78339 48.76604 48.75676
[17] 48.75192 48.74949 48.74833 48.74780 48.74760 48.74753 48.74753 48.74755
[25] 48.74757 48.74759 48.74760 48.74761 48.74762 48.74762
The graph showing the series and the six prediction intervals is the following

Seasonal ARIMA models
Seasonality in a time series is a regular pattern of changes that repeats over S time periods, where S defines the number of time periods until the pattern repeats again.
For example, there is seasonality in monthly data for which high values tend always to occur in some particular months, and low values tend always to occur in other particular months. In this case, S = 12 (months per year) is the span of the periodic seasonal behavior. For quarterly data, S = 4 time periods per year.
In a seasonal ARIMA model, seasonal AR and MA terms predict
using data values and errors at times with lags that are multiples of S (the span of the seasonality).
- With monthly data (and S = 12), a seasonal first-order
autoregressive model would use
to predict
. For instance, if we were selling cooling fans, we might predict this August's sales using last August's sales. (This relationship of predicting using last year's data would hold for any month of the year.)
- A seasonal second-order autoregressive model would use
and
to predict
. Here we would predict this August's values from the past two Augusts.
- A seasonal first order MA(1) model (with S = 12) would use
as a predictor. A seasonal second order MA(2) model would use
and
Differencing
Almost by definition, it may be necessary to examine differenced data when we have seasonality. Seasonality usually causes the series to be nonstationary because the average values at some particular times within the seasonal span (months, for example) may be different than the average values at other times. For instance, our sales of cooling fans will always be higher in the summer months.
Seasonal differencing
-
Seasonal differencing is defined as a difference between a value and a value with lag that is a multiple of S.
The differences (from the previous year) may be about the same for each month of the year, giving us a stationary series.
Seasonal differencing removes seasonal trends and can also get rid of a seasonal random walk type of nonstationarity.
Non-seasonal differencing
-
If a trend is present in the data, we may also need non-seasonal differencing. Often (not always), a first difference (non-seasonal) will "detrend" the data. That is, we use
in the presence of trend.
Differencing for Trend and Seasonality
-
When both trend and seasonality are present, we may need to apply both a non-seasonal first difference and a seasonal difference.
That is, we may need to examine the ACF and PACF of
.
Removing a trend doesn't mean that we have removed the dependency. We may have removed the mean,
, part of which may include a periodic component. In some ways, we are breaking the dependency down into recent things that have happened and long-range things that have happened.
Non-seasonal behavior will still matter...
With seasonal data, it is likely that short-run non-seasonal components will still contribute to the model. In the monthly sales of cooling fans mentioned above, for instance, sales in the previous month or two, along with the sales from the same month a year ago, may help predict this month's sales.
We'll have to look at the ACF and PACF behavior over the first few lags (less than S) to assess what non-seasonal terms might work in the model.
Seasonal ARIMA Model
The seasonal ARIMA model incorporates both non-seasonal and seasonal factors in a multiplicative model. One shorthand notation for the model is
with p = non-seasonal AR order, d = non-seasonal differencing, q = non-seasonal MA order, P = seasonal AR order, D = seasonal differencing, Q = seasonal MA order, and S = time span of repeating seasonal pattern.
Without differencing operations, the model could be written more formally as
On the left side of equation (1), the seasonal and non-seasonal AR components multiply each other, and on the right side of equation (1), the seasonal and non-seasonal MA components multiply each other.
Example 4-1: ARIMA
The model includes a non-seasonal MA(1) term, a seasonal MA(1) term, no differencing, no AR terms, and the seasonal period is S = 12.
The non-seasonal MA(1) polynomial is .
The seasonal MA(1) polynomial is .
When we multiply the two polynomials on the right side, we get
Thus the model has MA terms at lags 1, 12, and 13. This leads many to think that the identifying ACF for the model will have non-zero autocorrelations only at lags 1, 12, and 13. There's a slight surprise here. There will also be a non-zero autocorrelation at lag 11. We supply a proof in the Appendix.
We simulated n = 1000 values from an ARIMA . The non-seasonal MA (1) coefficient was
. The seasonal MA(1) coefficient was
.
The sample ACF for the simulated series was as follows:
The spikes at lags 1, 11, and 12 in the ACF. This is characteristic of the ACF for the ARIMA
Example 4-2: ARIMA
The model includes a non-seasonal AR(1) term, a seasonal AR(1) term, no differencing, no MA terms, and the seasonal period is S = 12.
The non-seasonal AR(1) polynomial is .
The seasonal AR(1) polynomial is .
If we let (for simplicity), multiply the two AR components and push all but zt to the right side we get
.
This is an AR model with predictors at lags 1, 12, and 13.
R can be used to determine and plot the PACF for this model, with
=.6 and
=.5. That PACF (partial autocorrelation
function) is:
It's not quite what you might expect for an AR model, but it almost is. There are distinct spikes at lags 1, 12, and 13, with a bit of action coming before lag 12. Then, it cuts off after lag 13.
R commands were
thepacf=ARMAacf (ar = c(.6,0,0,0,0,0,0,0,0,0,0,.5,-.30),lag.max=30,pacf=T)
Identifying a Seasonal Model
- Step 1: Do a time series plot of the data.
Examine it for features such as trend and seasonality. You'll know that you've gathered seasonal data (months, quarters, etc.,) so look at the pattern across those time units (months, etc.) to see if there is indeed a seasonal pattern.
- Step 2: Do any necessary differencing.
- If there is seasonality and no trend, then take a difference of lag S. For instance, take a 12th difference for monthly data with seasonality. Seasonality will appear in the ACF by tapering slowly at multiples of S. (View the TS plot and ACF/PACF plots for an example of data that requires a seasonal difference. Note that the TS plot shows a clear seasonal pattern that repeats over 12 time points. Seasonal differences are supported in the ACF/PACF of the original data because the first seasonal lag in the ACF is close to 1 and decays slowly over multiples of S=12. Once seasonal differences are taken, the ACF/PACF plots of twelfth differences support a seasonal AR(1) pattern).
- If there is a linear trend and no obvious seasonality, then take a first difference. If there is a curved trend, consider a transformation of the data before differencing.
- If there is both trend and seasonality, apply a
seasonal difference to the data and then re-evaluate the trend. If a
trend remains, then take first differences. For instance, if the series
is called x, the commands in R would be:
diff12=diff(x, 12)plot(diff12)acf2(diff12)diff1and12 = diff(diff12, 1) - If there is neither an obvious trend nor seasonality, don't take any differences.
- Step 3: Examine the ACF and PACF of the differenced data (if differencing is necessary).
We're using this information to determine possible models. This can be tricky going involving some (educated) guessing. Some basic guidance:
-
Non-seasonal terms: Examine the early lags (1, 2, 3, …) to judge non-seasonal terms. Spikes in the ACF (at low lags) with a tapering PACF indicate non-seasonal MA terms. Spikes in the PACF (at low lags) with a tapering ACF indicate possible non-seasonal AR terms.
-
Seasonal terms: Examine the patterns across lags that are multiples of S. For example, for monthly data, look at lags 12, 24, 36, and so on (probably won't need to look at much more than the first two or three seasonal multiples). Judge the ACF and PACF at the seasonal lags in the same way you do for the earlier lags.
-
- Step 4: Estimate the model(s) that might be reasonable on the basis of step 3.
Don't forget to include any differencing that you did before looking at the ACF and PACF. In the software, specify the original series as the data and then indicate the desired differencing when specifying parameters in the arima command that you're using.
- Step 5: Examine the residuals (with ACF, Box-Pierce, and any other means) to see if the model seems good.
Compare AIC or BIC values to choose among several models.
If things don't look good here, it's back to Step 3 (or maybe even Step 2).
Example 4-3
The data series are a monthly series of a measure of the flow of the Colorado River, at a particular site, for n = 600 consecutive months.
Step 1
A time series plot is
With so many data points, it's difficult to judge whether there is seasonality. If it was your job to work on data like this, you probably would know that river flow is seasonal – perhaps likely to be higher in the late spring and early summer, due to snow runoff.
Without this knowledge, we might determine means by month of the year. Below is a plot of means for the 12 months of the year. It's clear that there are monthly differences (seasonality).
Looking back at the time series plot, it's hard to judge whether there's any long-run trend. If there is, it's slight.
Steps 2 and 3
We might try the idea that there is seasonality but no trend. To do this, we can create a variable that gives the 12th
differences (seasonal differences), calculated as . Then, we look at the ACF and the PACF for the 12th difference series (not the original data). Here they are:
- Non-seasonal behavior: The PACF shows a clear spike at lag 1 and not much else until about lag 11. This is accompanied by a tapering pattern in the early lags of the ACF. A non-seasonal AR(1) may be a useful part of the model.
- Seasonal behavior: We look at what's going on around lags 12, 24, and so on. In the ACF, there's a cluster of (negative) spikes around lag 12 and then not much else. The PACF tapers in multiples of S; that is, the PACF has significant lags at 12, 24, 36, and so on. This is similar to what we saw for a seasonal MA(1) component in Example 1 of this lesson.
Remembering that we're looking at 12th differences, the model we might try for the original series is ARIMA .
Step 4
Final Estimates of Parameters
Type | Coef | SE Coef | T | P |
---|---|---|---|---|
AR 1 | 0.5149 | 0.0353 | 14.60 | 0.000 |
SMA 12 | -0.8828 | 0.0237 | -37.25 | 0.000 |
Constant | -0.0011 | 0.0007 | -1.63 | 0.103 |
sigma^2 estimated as 0.4681: log likelihood = -620.38, aic = 1248.76
$degrees_of_freedom
[1] 585
Step 5 (diagnostics)
The normality and Box-Pierce test results are shown in Lesson 4.2. Besides normality, things look good. The Box-Pierce statistics are all non-significant, and the estimated ARIMA coefficients are statistically significant.
The ACF of the residuals looks good too:
What doesn't look perfect is a plot of residuals versus fits. There's non-constant variance.
We've got three choices for what to do about the non-constant variance: (1) ignore it, (2) go back to step 1 and try a variance stabilizing transformation like log or square root, or (3) use an ARCH model that includes a component for changing variances. We'll get to ARCH models later in the course.
Lesson 4.2 for this week will give R guidance and an additional example or two.
Appendix (Optional reading)
Only those interested in theory need to look at the following.
In Example 4-1, we promised a proof that ≠ 0 for ARIMA
.
A correlation is defined as Covariance/ product of standard deviations.
For the model in Example 1,
The w's are independent errors. The expected value of any product involving w's with different subscripts will be 0. A covariance between w's with the same subscripts will be the variance of w.
If you inspect all possible products in expression 2, there will be one product with matching subscripts. They have lag t – 12. Thus this expected value (covariance) will be different from 0.
This shows that the lag 11 autocorrelation will be different from 0.
If you look at the more general problem, you can find that only lags 1,
11, 12, and 13 have non-zero autocorrelations for the ARIMA.
A seasonal ARIMA model incorporates both non-seasonal and seasonal factors in a multiplicative fashion.
Identifying Seasonal Models and R Code
In the previous lesson, Example 3 described the analysis of monthly flow data for a Colorado River location. An ARIMA(1,0,0)×(0,1,1)12 was identified and estimated. In the first part of this lesson, you'll see the R code and output for that analysis.
Example 4-3: Revisited
R code for the Colorado River Analysis
The data are in the file coloradoflow.dat. We used scripts in the library, so the first two lines of code are:
library(astsa)
flow <- ts(scan("coloradoflow.dat"))
The first step in the analysis is to plot the time series. The command is:
plot(flow, type="b")
The plot showed clear monthly effects and no obvious trend, so we examined the ACF and PACF of the 12th differences (seasonal differencing). Commands are:
diff12 = diff(flow,12)
acf2(diff12, 48)
The acf2 command asks for information about 48 lags. On the basis of the ACF and PACF of the 12th differences, we identified an ARIMA(1,0,0)×(0,1,1)12 model as a possibility. The command for fitting this model is
sarima(flow, 1,0,0,0,1,1,12)
The parameters of the command just given are the data series, the non-seasonal specification of AR, differencing, and MA, and then the seasonal specification of seasonal AR, seasonal differencing, seasonal MA, and period or span for the seasonality.
Output from the sarima command is:
Final Estimates of Parameters
Type | Coef | SE Coef | T | P |
---|---|---|---|---|
AR 1 | 0.5149 | 0.0353 | 14.60 | 0.000 |
SMA 12 | -0.8828 | 0.0237 | -37.25 | 0.000 |
Constant | -0.0011 | 0.0007 | -1.63 | 0.103 |
sigma^2 estimated as 0.4681: log likelihood = -620.38, aic = 1248.76
$degrees_of_freedom
[1] 585

We didn't generate any forecasts, but if we were to use R to generate forecasts for the next 24 months in R, one command that could be used is
sarima.for(flow, 24, 1,0,0,0,1,1,12)
Note!
The order of parameters in the command is the name of the data series, the number of times for which we want forecasts, followed by the parameters of the ARIMA model.Partial output for the forecast command follows (We skipped giving the standard errors).
$pred
Time Series:
Start = 601
End = 624
Frequency = 1
[1] 0.3013677 0.3243119 0.5676440 1.2113713 2.7631409 3.5326281
[7] 1.3931886 0.5971816 0.3294376 0.4161899 0.4180578 0.3186012
[13] 0.2838731 0.3088276 0.5531948 1.1974552 2.7494992 3.5191278
[19] 1.3797610 0.5837915 0.3160668 0.4028290 0.4047020 0.3052481

Note that the lower limits of some prediction intervals are negative - impossible for the flow of a river. In practice, we might truncate these lower limits to 0 when presenting them.
If you were to use R's native commands to do the fit and forecasts, the commands might be:
themodel = arima(flow, order = c(1,0,0), seasonal = list(order = c(0,1,1), period = 12))
themodel
predict(themodel, n.ahead=24)
The first command does the arima and stores results in an "object" called "themodel". The second command, which is simply the model, lists the results, and the final command generates forecasts for 24 times ahead.
In the previous lesson, we presented a graph of the monthly means to make the case that the data are seasonal. The R commands are:
flowm = matrix(flow, ncol=12,byrow=TRUE)
col.means=apply(flowm,2,mean)
plot(col.means,type="b", main="Monthly Means Plot for Flow", xlab="Month", ylab="Mean")

Example 4-4: Beer Production in Australia
This plot for quarterly beer production in Australia was given.
There is an upward trend and seasonality. The regular pattern of ups and downs is an indicator of seasonality. If you count points, you'll see that the 4th quarter of a year is the high point, the 2nd and 3rd quarters are low points, and the 1st quarter value falls in between.
With trend and quarterly seasonality, we will likely need both a first and a fourth difference. The plot above was produced in Minitab, but we'll use R for the rest of this example. The commands for creating the differences and graphing the ACF and PACF of these differences are:
diff4 = diff(beer, 4)
diff1and4 = diff(diff4,1)
acf2(diff1and4,24)
Note!
When we read the data, we called the series "beer" and also loaded the astsa library.
Let's examine the TS plot and ACF/PACF of seasonal differences, diff4, to determine whether or not first differences are also necessary.

The TS plot still shows that the upward trend remains; however, the ACF/PACF do not suggest the need to difference further. In this case, we could detrend by subtracting an estimated linear trend from each observation as follows:
dtb=residuals(lm (diff4~time(diff4)))
acf2(dtb)
The ACF and PACF of the detrended seasonally differenced data follow. The interpretation:
- Non-seasonal: Looking at just the first 2 or 3 lags, either a MA(1) or AR(1) might work based on the similar single spike in the ACF and PACF, if at all. Both terms are also possible with an ARMA(1,1), but with both cutting off immediately, this is less likely than a single-order model. With S=4, the nonseasonal aspect can sometimes be difficult to interpret in such a narrow window.
- Seasonal: Look at lags that are multiples of 4 (we have quarterly data). Not much is going on there, although there is a (barely) significant spike in the ACF at lag 4 and a somewhat confusing spike at lag 9 (in ACF). Nothing significant is happening at the higher lags. Maybe a seasonal MA(1) or MA(2) might work.

We tried a few models. An initial guess was ARIMA(0,0,1)×(0,1,1)4, which wasn't a bad guess. We also tried ARIMA(1,0,0)×(0,1,1)4, ARIMA(1,0,1)×(0,1,1)4, and ARIMA(1,0,0)×(0,1,2)4.
The commands for the 4 models are:
sarima (dtb, 0,0,1,0,0,1,4)
sarima (dtb, 1,0,0,0,0,1,4)
sarima (dtb, 0,0,0,0,0,1,4)
sarima (dtb, 1,0,0,0,0,2,4)
Model | MSE | Sig. of coefficients | AIC | ACF of residuals |
---|---|---|---|---|
ARIMA(0,0,1)×(0,1,1)4 | 89.03 | Only the Seasonal MA is sig. | 508.25 | OK |
ARIMA(1,0,0)×(0,1,1)4 | 88.73 | Only the Seasonal MA is sig. | 507.97 | OK |
ARIMA(0,0,0)×(0,1,1)4 | 92.17 | Only the Seasonal MA is sig. | 509.19 | Slight concern at lag 1, though not significant |
ARIMA(1,0,0)×(0,1,2)4 | 86.93 | Only the Seasonal MA is sig. | 509.05 | OK |
You may wish to examine this model with first and fourth differences. Differencing twice usually removes any drift from the model, and so sarima does not fit a constant when d=1 and D=1. In this case you may difference within the sarima command, e.g. sarima(x,1,1,1,0,1,1,S). However, there are cases when drift remains after differencing twice, and so you must difference outside of the sarima command to fit a constant. It seems the safest choice is to difference outside of the sarima command to first verify that the drift is gone.