Time Series Basics
A time series is a set of points that are ordered in time. Each time point is usually assigned an integer index to indicate its position within the series. For example, you can construct a time series by measuring and computing an average daily temperature. When the outcome of the next point in a time series is unknown, the time series is said to be random or "stochastic" in nature. A simple example would be creating a time series from a sequential set of coin flips with outcomes of either heads or tails. A more practical example is the time series of prices of a given stock.
When the unconditional joint probability distribution of the series does not change with time (it is time-invariant), the stochastic process generating the time series is said to be stationary. Under these circumstances, parameters such as the mean and standard deviation do not change over time. Assuming the same coin for each flip, the coin flip is an example of a stationary process. On the other hand, stock price data is not a stationary process.
This unit aims to use your knowledge of statistics to model time series data for random processes. Even though the outcome of the next time point is unknown, given the time series statistics, it should be possible to make inferences if you can create a model. The concept of a stationary random process is central to statistical model building. Since nonstationary processes require a bit more sophistication than stationary processes, it is important to understand what type of time series is being modeled. Our first step in this direction is to introduce the autoregressive (AR) model. This linear model can be used to estimate current time series values based on known past time series values. Read through this article which introduces the idea behind AR models and additionally explains the autocorrelation function (ACF).
R Code for Two Examples in Lessons 1.1 and 1.2
One example in Lesson 1.1 and Lesson 1.2 concerned the annual number of earthquakes worldwide with a magnitude greater than 7.0 on the seismic scale. We identified an AR(1) model (autoregressive model of order 1), estimated the model, and assessed the residuals. Below is the R code that will accomplish these tasks. The first line reads the data from a file named quakes.dat. The data are listed in time order from left to right in the lines of the file. If you were to download the file, you should download it into a folder that you create for storing course data. Then in R, change the working directory to this folder.
The commands below include explanatory comments, following the #. Those comments do not have to be entered for the command to work.
x=scan("quakes.dat")
x=ts(x) #this makes sure R knows that x is a time series
plot(x, type="b") #time series plot of x with points marked as "o"
install.packages("astsa")
library(astsa) # See note 1 below
lag1.plot(x,1) # Plots x versus lag 1 of x.
acf(x, xlim=c(1,19)) # Plots the ACF of x for lags 1 to 19
xlag1=lag(x,-1) # Creates a lag 1 of x variable. See note 2
y=cbind(x,xlag1) # See note 3 below
ar1fit=lm(y[,1]~y[,2])#Does regression, stores results object named ar1fit
summary(ar1fit) # This lists the regression results
plot(ar1fit$fit,ar1fit$residuals) #plot of residuals versus fits
acf(ar1fit$residuals, xlim=c(1,18)) # ACF of the residuals for lags 1 to 18
Note!
- The astsa library accesses R script(s) written by one of the authors of our textbook (Stoffer). In our program, the lag1.plot command is part of that script. You may read more about the library on the website for our text: ASTSA R package. You must install the astsa package in R before loading the commands in the library statement. Not all available packages are included when you install R on your machine (Cran R Project Web Packages). You only need to run install.packages("astsa") once. In subsequent sessions, the library command alone will bring the commands into your current session.
- Note the negative value for the lag in xlag1=lag(x,−1) To lag back in time in R, use a negative lag.
- This is a bit tricky. For whatever reason, R has to bind together a variable with its lags for the lags to be in the proper connection with the original variable. The cbind and the ts.intersect commands both accomplish this task. In the code above, the lagged variable and the original variable become the first and second columns of a matrix named y. The regression command (lm) uses these two columns of y as the response and predictor variables in the regression.
General Note!
If a command that includes quotation marks doesn't work when you copy and paste from course notes to R, try typing the command in R instead.
The results, as given by R, follow.
The time series plot for the quakes series.
Plot of x versus lag 1 of x.
The ACF of the quakes series.
The regression results:
Estimate | Std. Error | t value | Pr(>|t|) | |
---|---|---|---|---|
(Intercept) | 9.19070 | 1.81924 | 5.052 | 2.08e-06 *** |
y[, 2] | 0.54339 | 0.08528 | 6.372 | 6.47e-09 *** |
Signif. codes: 0 ‘***' 0.001 ‘**' 0.01 ‘*' 0.05 ‘.' 0.1 ‘ ' 1
Residual standard error: 6.122 on 96 degrees of freedom
(2 observations deleted due to missingness)
Multiple R-squared: 0.2972, Adjusted R-squared: 0.2899
Plot of residuals versus fits
ACF of residuals
An example in Lesson 1.2 for this week concerned the weekly cardiovascular mortality rate in Los Angeles County. We used a first difference to account for a linear trend and determine that the first differences may have an AR(1) model.
Following are R commands for the analysis. Again, the commands are commented using #comment.
mort=scan("cmort.dat")
plot(mort, type="o") # plot of mortality rate
mort=ts(mort)
mortdiff=diff(mort,1) # creates a variable = x(t) – x(t-1)
plot(mortdiff,type="o") # plot of first differences
acf(mortdiff,xlim=c(1,24)) # plot of first differences, for 24 lags
mortdifflag1=lag(mortdiff,-1)
y=cbind(mortdiff,mortdifflag1) # bind first differences and lagged first differences
mortdiffar1=lm(y[,1]~y[,2]) # AR(1) regression for first differences
summary(mortdiffar1) # regression results
acf(mortdiffar1$residuals, xlim = c(1,24)) # ACF of residuals for 24 lags.
We'll leave it to you to try the code and see the output if you wish.