Time-Series Modeling and Decomposition

Read this article. It provides an overview of techniques associated with decomposition. Part 4, The Business Cycle, presents how this tool is applied in business operations. Why do you think decomposition is useful in understanding seasonality costs?

TIME SERIES DECOMPOSITION MODELS

An important goal in time series analysis is the decomposition of a series into a set of non-observable (latent) components that can be associated to different types of temporal variations. The idea of time series decomposition is very old and was used for the calculation of planetary orbits by seventeenth century astronomers. Persons (1919) was the first to state explicitly the assumptions of unobserved components. As Persons saw it, time series were composed of four types of fluctuations:

(1) A long-term tendency or secular trend.

(2) Cyclical movements super-imposed upon the long-term trend. These cycles appear to reach their peaks during periods of industrial prosperity and their troughs during periods of depressions, their rise and fall constituting the business-cycle.

(3) A seasonal movement within each year, the shape of which depends on the nature of the series.

(4) Residual variations due to changes impacting individual variables or other major events such as wars and national catastrophes affecting a number of variables.

Traditionally, the four variations have been assumed to be mutually independent from one another and specified by means of an additive decomposition model:

X_{t}=T_{t}+C_{t}+S_{t}+I_{t}

where X_{t} denotes the observed series, T_{t} the long-term trend, C_{t} the business-cycle, S_{t} seasonality and I_{t} the irregulars.

If there is dependence among the latent components, this relationship is specified through a multiplicative model

X_{t}=T_{t} C_{t} S_{t} I_{t}

where now S_{t} and I_{t} are expressed in proportion to the trend-cycle T_{t}C_{t}. In some cases, mixed additive-multiplicative models are used.

Whether a latent component is present or not in a given time series depends on the nature of the phenomenon and on the frequency of measurement. For example, seasonality is due to the fact that some months or quarters of a year are more important in terms of activity or level. Because this component is specified to cancel out over 12 consecutive months or 4 consecutive quarters, or more generally over 365.25 consecutive days, yearly series cannot contain seasonality.

Flow series can be affected by other variations associated to the composition of the calendar. The most important are the trading-day variations, which are due to the fact that some days of the week are more important than others. Months with five of the more important days register an excess of activity (ceteris paribus) in comparison to months with four such days. Conversely, months with five of the less important days register a short-fall of activity. The length-of-month variation is usually assigned to the seasonal component. The trading-day component is usually considered as negligible in quarterly series and even more so in yearly data. Another important calendar variation is the moving-holiday or moving-festival component. That component is associated to holidays which change date from year to year, e.g. Easter and Labour Day, causing a displacement of activity from one month to the previous or the following month. For example, an early date of Easter in March or early April can cause an important excess of activity in March and a corresponding short-fall in April, in variables associated to imports, exports, hospitality and tourism.

Under models (1) and (2), the trading-day and moving festival components (if present) are implicitly part of the irregular. Young developed a procedure to estimate trading-day variations which was incorporated in the X-11 seasonal adjustment method by Shiskin et al. in 1967 and its subsequent versions, the X-11-ARIMA and X12-ARIMA methods. The later two versions also include models to estimate moving-holidays due to Easter.

Considering these new components, the additive decomposition model (1) becomes

X_{t}=T_{t}+C_{t}+S_{t}+D_{t}+H_{t}+I_{t}

where D_{t} and H_{t} respectively denote the trading-day and moving-holiday components. Similarly, the multiplicative decomposition model (2) becomes

X_{t}=T_{t} C_{t} S_{t} D_{t} H_{t} I_{t}

where the components S_{t}, D_{t}, H_{t} and I_{t} are proportional to the trend-cycle T_{t}C_{t}.

Decomposition models (3) and (4) are traditionally used by seasonal adjustment methods. Seasonal adjustment actually entails the estimation of all the time series components and the removal of seasonality, trading-day and holiday effects from the observed series. The rationale is that these components which are relatively predictable conceal the current stage of the business cycle which is critical for policy and decision making.

Another major objective in time series analysis is the modelling of the observed series mainly for forecasting purposes. In this case, an often used decomposition model for univariate time series is

X_{t}=\eta_{t}+e_{t}

where \eta_{t} and e_{t} are referred to as the signal and the noise using electrical engineering terminology. The signal \eta_{t} comprises all the systematic components of models (1) to (4), i.e. T_{t}C_{t}, S_{t}, D_{t} and H_{t}.

Model (5) is classical in signal extraction where the problem is to find the best estimates of the signal \left\{\eta_{t}\right\} given the observations \left\{x_{t}\right\} corrupted by noise \left\{e_{t}\right\}. The best estimates are usually defined as minimizing the mean square error.

Signal extraction can be made by means of parametric models or nonparametric procedures. The latter has a long standing and was used by actuaries at the beginning of the 1900's. The main assumption in non-parametric procedures is that \eta_{t} is a smooth function of time. Different types of smoothers are used depending on the series under question. The most common smoothers are the cubic splines originally applied by Whittaker and Whittaker and Robinson to smooth mortality tables. Other smoother are moving averages and high order kernels used in the context of seasonal adjustment and form the basis of methods such as Census X-11, X-11-ARIMA, X-12-ARIMA, STL.

Non-parametric signal extraction has also been very much applied to estimate the trend (non-stationary mean) of time series. Among nonparametric procedures, the 13-term Henderson trend-cycle estimator is the most often applied because of its good property of rapid turning point detection but it has the disadvantages of: (1) producing a large number of unwanted ripples (short cycles of 9 and 10 months) that can be interpreted as false turning points and, (2) large revisions for the most recent values (often larger than those orresponding seasonally adjusted data).

The use of longer Henderson filters is not an alternative for the reduction in false turning points is achieved at the expense of increasing the time lag of turning point detection. In 1996, Dagum proposed a new method that enables the use of the 13-term Henderson filter with the advantages of: (1) reducing the number of unwanted ripples, (2) reducing the size of the revisions to most recent trendcycle estimates and, (3) no increase in time lag of turning point detection.

The Dagum method basically consists of producing one year of ARIMA extrapolations from a seasonally adjusted series with extreme values replaced by default; extending the series with the extrapolated values and then, applying the Henderson filter to the extended seasonally adjusted series requesting smaller sigma limits (not the default) for the replacement of extreme values. The object is to pass through the 13-term Henderson filter, an input with reduced noise. This procedure was applied to the nine Leading Indicator series of the Canadian Composite Leading Index with excellent results and is currently being used by many statistical agencies. In a recent work, Dagum and Luati ped a linear approximation to the nonlinear Dagum method which gave very good results in empirical applications.

Other recent works on nonparametric trend-cycle estimation were done by Dagum and Bianconcini where these authors derive a Reproducing kernel Hilbert Space (RKHS) representation of the Henderson and LOESS smoothers with particular emphasis on the asymmetric ones applied to most recent observations. A RKHS is a Hilbert space characterized by a kernel that reproduces, via an inner product, every function of the space or, equivalently, a Hilbert space of real valued functions with the property that every point evaluation functional is a bounded linear functional. This Henderson kernel representation enables the construction of a hierarchy of kernels with varying smoothing properties. The asymmetric filters are derived coherently with the corresponding symmetric weights or from a lower or higher order kernel within a hierarchy, if more appropriate. In the particular case of the currently applied asymmetric Henderson and LOESS filters, those obtained by means of the RKHS are shown to have superior properties relative to the classical ones from the view point of signal passing, noise suppression and revisions.

In another study, Dagum and Bianconcini derive two density functions and corresponding orthonormal polynomials to obtain two Reproducing Kernel Hilbert Space representations which give excellent results for filters of short and medium lengths. Theoretical and empirical comparisons of the Henderson third order kernel asymmetric filters were made with the classical ones again showing superior properties of signal passing, noise suppression and revisions. Dagum and Bianconcini, provide a common approach for studying several nonparametric estimators used for smoothing functional time series data. Linear filters based on different building assumptions are transformed into kernel functions via reproducing kernel Hilbert spaces. For each estimator, these authors identify a density function or second order kernel, from which a from which a hierarchy of higher order estimators is derived. These are shown to give excellent representations for the currently applied symmetric filters. In particular, they derive equivalent kernels of smoothing splines in Sobolev space and polynomial space. A Sobolev space intuitively, is a Banach space and in some cases a Hilbert space of functions with sufficiently many derivatives for some application domain, and equipped with a norm that measures both the size and smoothness of a function.

Sobolev spaces are named after the Russian mathematician Sergei Sobolev. The asymmetric weights are obtained by adapting the kernel functions to the length of the various filters, and a theoretical and empirical comparison is made with the classical estimators used in real time analysis. The former are shown to be superior in terms of signal passing, noise suppression and speed of convergence to the symmetric filter.

On the other hand, signal extraction by means of explicit models arrived much later. Under the assumption that the entire realization of y_{t} is observed from -\infty to +\infty and \eta_{t} and e_{t} are both mutually independent and stationary, Kolmogorov and Wiener independently proved that the minimum mean square estimator of the signal  eta_{t} is the conditional mean given the observations y_{t}, that is \dot{\eta}_{t}=E\left(\eta_{t} \mid y_{t}, y_{t-1}, \ldots\right). This fundamental result was extended by several authors who provided approximate solutions to the nonstationary signal extraction, particularly Hannan, Sobel and Cleveland and Tiao. Finally, Bell provided exact solutions for the conditional mean and conditional variance of vector \eta when non-stationarity can be removed by applying differences of a finite order. This author used two alternatives regarding the generation of vectors y, eta and e.

Model-based signal extraction was also used in the context of seasonal adjustment where the signal \eta_{t} is assumed to follow an ARIMA model of the Box and Jenkins type, plus a regression model for the deterministic variations. The latter is applied to estimate deterministic components, such as trading-day variations or moving-holiday effects and outliers. Gersch and Kitagawa and Koopman et al. also used signal extraction for seasonal adjustment where the signal \eta_{t} is assumed to follow a structural time series component model cast in state-space representation. Signal extraction, parametric and non-parametric, is also widely applied for forecasting purposes.

The feasibility of the decomposition of a time series was proved by Herman Wold in 1938. Wold showed that any second-order stationary stochastic process \left\{X_{t}\right\} can be decomposed in two mutually uncorrelated processes \left\{Z_{t}\right\} and \left\{V_{t}\right\}, such that

X_{t}=Z_{t}+V_{t}                                                                                                      (6.a)

where

Z_{t}=\sum_{j=0}^{\infty} \psi_{j} a_{t-j}, \psi_{0} \equiv 1, \sum_{j=1}^{\infty} \psi_{j}^{2} \text { < } \infty,                                                   (6.b)

with \left\{a_{t}\right\} \sim W N\left(0, \sigma_{a}^{2}\right).

Model (6.b) is known as an infinite moving average M A(\infty) where the a_{t}'s are the innovations. \left\{Z_{t}\right\} is a convergent infinite linear combination of the  a_{t}'s, which are assumed to follow a white noise (W N) process of zero mean, constant variance \sigma_{a}^{2}, and zero autocovariance. The component \left\{Z_{t}\right\} is called the nondeterministic or purely linear component since only one realization of the process is not sufficient to determine future values Z_{t+\ell}, \ell>0 , without error. Component \left\{V_{t}\right\} can be represented by

V_{t}=\mu+\Sigma_{j=1}^{\infty}\left[\alpha_{j} \sin \left(\lambda_{j} t\right)+\beta_{j} \cos \left(\lambda_{j} t\right)\right],-\pi \text { < } \lambda_{j} \text { < } \pi                           (6.c)

where \mu is the constant mean of process \left\{X_{t}\right\}  and \left\{\alpha_{j}\right\},\left\{\beta_{j}\right\} are mutually uncorrelated white noise processes. The series \left\{V_{t}\right\} is called the deterministic part because it can be predicted in the future without error from a single realization of the process by means of an infinite linear combination of past values.

Wold theorem demonstrates that the property of stationarity is strongly related to that of linearity. It provides a justification for autoregressive moving average (ARMA) models and some extensions, such as the autoregressive integrated moving average (ARIMA) and regression-ARIMA models (RegARIMA).

A stochastic process \left\{X_{t}\right\} is second-order stationary or weakly stationary, if the first two moments are not time dependent, that is, the mean and the variance are constant, and the autocovariance function depends only on the time lag and not on the time origin, that is,

E\left(X_{t}\right)=\mu \text { < } \infty                                                                                                 (7.a)

E\left(X_{t}-\mu\right)^{2}=\sigma_{X}^{2} \text { < }\infty, E\left[\left(X_{t}-\mu\right)\left(X_{t-k}-\mu\right)\right]=\gamma(k) \text { < } \infty                 (7.b)

where k=0,1,2, \ldots denotes the time lag.