The name of a time series routine is a combination of three to four sets of one to four letters. The first set specifies the type of model or method. The second set identifies the particular approach. If the name uses four sets of letters, then both the second and third sets are used to identify the particular approach. The final set always specifies the general procedure. The table below summarizes the naming convention of the time series analysis and forecasting routines.
The names and meanings of arguments are consistent within a set of routines pertaining to a particular topic. For example, XCNTR corresponds to the constant used to center the time series X in all of the spectral analysis routines. Note that IPRINT always represents the printing option, the values and possible choices of output necessarily depend on the given routine. An option argument always begins with the letter “I,” and a leading dimension argument always begins with “LD.”
The routines in this chapter assume the time series does not contain any missing observations. If missing values are present, they should be set to NaN (see the Reference Material section for the routine AMACH), and the routine will return an appropriate error message. To enable fitting of the model, the missing values must be replaced by appropriate estimates.
Naming Conventions in Chapter 8
Meaning
Abbreviation
Bayesian
BAY*
Nonseasonal ARMA
NS*
Transfer Function
TF*
Maximum Likelihood
MAX*
Multichannel
M*
Periodogram
P*
Cross Periodogram
CP*
Spectral Density
S*
Automatic Model Selection
AUTO*
Cross‑Spectral Density
CS*
Preliminary
*P*
Univariate
*UNI*
Multivariate
*MUL*
Final Prediction Error
*FPE*
Method of Moments
*MM*
Least‑Squares
*LS*
Box‑Jenkins
*BJ*
Spectral Window
*SW*
Weights
*WE*
Autoregressive
*AR
Autoregressive, Moving Average
*ARMA
Seasonal Modeling
*SEA
Estimation
*E
Forecast
*F
Fast Fourier Transform
*FFT
Periodogram
*P
Data
*D
The “*” represents one or more letters.
General Methodology
A major component of the model identification step concerns determining if a given time series is stationary. The sample correlation functions computed by routines ACF, PACF, CCF, and MCCF may be used to diagnose the presence of nonstationarity in the data, as well as to indicate the type of transformation require to induce stationarity. The family of power transformations provided by routine BCTR coupled with the ability to difference the transformed data using routine DIFF affords a convenient method of transforming a wide class of nonstationary time series to stationarity.
The “raw” data, transformed data, and sample correlation functions also provide insight into the nature of the underlying model. Typically, this information is displayed in graphical form via time series plots, plots of the lagged data, and various correlation function plots. The routines in Chapter 16, “Line Printer Graphics” provide the necessary tools to produce the visual displays of this quantitative information.
The observed time series may also be compared with time series generated from various theoretical models to help identify possible candidates for model fitting. The routine RNARM in Chapter 18, “Random Number Generation” may be used to generate a time series according to a specified autoregressive moving average model.
Time Domain Methodology
Once the data are transformed to stationarity, a tentative model in the time domain is often proposed and parameter estimation, diagnostic checking and forecasting are performed.
Autoregressive Moving Average Model
A parsimonious, yet comprehensive, class of stationary time series models consists of the nonseasonal autoregressive moving average (ARMA) processes defined by
ɸ(B)(Wt−μ) = θ(B)At t∈ ZZ
where
ZZ = {…, −2, −1, 0, 1, 2, …}
denotes the set of integers, B is the backward shift operator defined by BkWt = Wt−k, μ is the mean of Wt,
ɸ(B) = 1 −ɸ1B−ɸ2B2−…−ɸpBp p≥ 0
θ(B) = 1 −θ1B−θ2B2−…−θqBq q≥ 0
The model is of order (p, q) and is referred to as an ARMA(p, q) model.
An equivalent version of the ARMA(p, q) model is given by
ɸ(B)Wt = θ0 + θ(B)At t∈ ZZ
where θ0 is an overall constant defined by
See Box and Jenkins (1976, pages 92–93) for a discussion of the meaning and usefulness of the overall constant. The coefficients in the ARMA model can be estimated using MAX_ARMA.
Parameter estimates for ARMA processes can also be obtained using the MAX_ARMA routine. This routine uses the maximum likelihood method to obtain estimates for the moving average and autoregressive parameters in an ARMA model. This routine also requires initial parameter estimates and further requires that these initial values represent a stationary time series. If they are not stationary, MAX_ARMA replaces these estimates with initial estimates that are stationary. However these may be far away from the values selected to initially describe this series.
Moreover, the method of maximum likelihood for estimating ARMA parameters may not converge to stationary estimates. In this case, MAX_ARMA will display a warning message and sets its convergence parameter ICONV to zero.
If the “raw” data {Zt} are homogeneous nonstationary, then differencing induces stationarity and the model is called autoregressive integrated moving average (ARIMA). Parameter estimation is performed on the stationary time series
Wt = ∇dZt
where
∇d = (1 −B)d
is the backward difference operator with period 1 and order d, d > 0.
Typically, routine NSPE is first applied to the transformed data to provide preliminary parameter estimates. These estimates are used as initial values in an estimation procedure. In particular, routine NSLSE may be used to compute conditional or unconditional least‑squares estimates of the parameters, depending on the choice of the backcasting length. Parameter estimates from either NSPE or NSLSE may be input to routine NSBJF to produce forecasts with associated probability limits. The routines for preliminary parameter estimation, least squares parameter estimation, and forecasting follow the approach of Box and Jenkins (1976, programs 2–4, pages 498–509).
Regression in Autoregressive Integrated Moving Average
There may be one or more external time series that relate to the time series of interest, which may be useful in improving forecasts. Routine REG_ARIMA allows for the inclusion of one or more regression time series in the above ARIMA model. That is, if there are r time series {Xi,t, i = 1, …, r} associated with a times series Yt, the regression ARIMA model (integrated of order d ) is
Wt = ∇dZt
where
That is, Zt is the residual (indexed by t) of the regression of Yt on {Xi,t, i = 1, …, r}.
Transfer Function Model
Define {xt} and {yt} by
and
where {Xt} and {Yt} for t = (‑d + 1), …, n represent the undifferenced input and undifferenced output series with
estimates of their respective means. The differenced input and differenced output series may be obtained using the routine DIFF following any preliminary transformation of the data.
The transfer function model is defined by
Yt = δ−1(B)ω(B)Xt−b+ nt
or equivalently,
yt = δ−1(B)ω(B)xt−b + nt
where nt = ∇dNt for d≥ 0, and the left‑hand side and right‑hand side transfer function polynomial operators are, respectively,
δ(B) = 1 −δ1B−δ2B2−…−δrBr
ω(B) = ω0−ω1B−ω2B2−…−ωsBs
with r≥ 0, s≥ 0, and b≥ 0. The noise process {Nt} and the input process {Xt} are assumed to be independent, with the noise process given by the ARIMA model
ɸ(B)nt = θ(B)At
where
ɸ(B) = 1 −ɸ1B−ɸ2B2−…−ɸpBp
θ(B) = 1 −θ1B−θ2B2−…−θqBq
with p≥ 0 and q≥ 0.
The impulse response weights {vk} of the transfer function
ν(B) = δ−1(B)ω(B) = ν0 + ν1B + ν2B2+ …
and the differenced noise series {nt} are estimated using routine IRNSE. Preliminary estimates of the transfer function parameters and noise model parameters are computed by routine TFPE.
Multivariate Autoregressive Time Series
A multivariate autoregressive time series can be expressed as:
where
is a column vector containing the values for the m univariate time series at time = t.
are the m x m matrices containing the autoregressive parameters for lags 1, 2, …, p.
is a column vector containing the values for the m white noise values for each time series at time = t.
Akaike’s Information Criterion (AIC) can be used to identify the optimum number of lags. For a multivariate autoregressive time series,
where
N = number of observations in each of the m univariate time series,
K = number of non‑zero autoregressive coefficients in the parameter matrices,
p = maximum number of lags used in calculating AIC, and
= determinate of the estimated m by m covariance matrix for the white noise, Ut.
Routine AUTO_MUL_AR calculates AIC for selected lags and returns parameter estimates for the optimum lag.
Multichannel Time Series
A multichannel time series X is simply a multivariate time series whose channels correspond to interrelated univariate time series. In this setting, the model‑building process is a logical extension of the procedures used to identify, estimate, and forecast univariate time series. In particular, the multichannel cross‑correlation function computed by routine MCCF may help identify a tentative model. A particular regression model may be fit using routine MLSE, with the Wiener filter estimated using routine MWFE. The Wiener forecast function for a single channel may be obtained by routine SPWF. The state space approach to fitting many time domain models is available through routine KALMN.
Automatic Model Selection
There are two popular criteria for comparing autoregressive (AR) models with different lags:
FPE – Final Prediction Error
AIC – Akaike’s Information Criterion.
These are defined for both univariate and multivariate time series. FPE for an autoregressive univariate model with p lags is calculated using the formula:
where
N = number of observations in the time series, and = the estimate for the variance of the white noise term in an AR model of order p. The fit with the smallest FPE is considered best.
Similarly, AIC for an AR univariate series is calculated by:
where
L = the value for the maximum likelihood function for the fitted model, and
p = number of lags for the AR model. In some routines this is approximated by
where K= number of nonzero parameters in the model.
Similar to FPE, the fit with the smallest AIC is considered best.
A formula also exist for the final prediction error of a multivariate time series:
where m = number of univariate time series, and p = maximum number of lags in the AR model. The equivalent multivariate AIC calculation is
The routine AUTO_PARM uses a third criterion, called “Minimum Description Length” or MDL, to automatically fit piecewise AR models to a time series with structural breaks (i.e., a potentially non‑stationary time series having stationary segments).
The MDL is defined as
where m is the number of structural breaks in the series, {1, 2, ..., m+1 are the locations of the breaks, nj is the number of observations in the j‑th segment, pj is the order of the AR model fit to the j‑th segment, and L is the combined likelihood over all segments. AUTO_PARM also allows the choice to use the AIC criterion,
The table below summarizes the five routines for automatic AR fitting.
Similarly, let {Xt} denote a discrete‑parameter stationary process with mean
μ = E[Xt]
and autocovariance function
σ(k) = cov{Xt, Xt+k} = E{[Xt−μ][Xt+k−μ]} k∈ZZ
Note that σ(0) = σ2 is the variance of the process.
The routines for the spectral analysis of time series are concerned with the estimation of the spectral density of a stationary process given a finite realization {Xt} for t = 1, …, n where n = NOBS. This realization consists of values sampled at equally spaced time intervals in the continuous‑parameter case or of values observed consecutively in the discrete‑parameter case. Hence, we need only develop methodology concerned with the spectral analysis of discrete‑parameter stationary processes and later account for the time sampling in the continuous‑parameter model.
The nonnormalized spectral density h(ω) and the autocovariance function σ(k) of the stationary process form a Fourier transform pair. The relationship in the continuous‑parameter case is given by
Similarly, the normalized spectral density f(ω) and the autocorrelation function ρ(k) = σ(k)/σ(0) of the stationary process form a Fourier transform pair. The relationship in the continuous‑parameter case is given by
The discrete‑parameter analogs of the above equations involve summation over k instead of integration over dk. Also, the normalized spectral density f(ω) satisfies
Discrete Fourier Transform. The discrete Fourier transform of the sequence {Zt} for t = 1, …, N is defined by
over the discrete set of frequencies
where the function ⌊r⌋ determines the greatest integer less than or equal to r. An alternative representation of ζ(ωp) in terms of cosine and sine transforms is
ζ(ωp) = α(ωp) − iβ(ωp)
where
The fast Fourier transform algorithm implemented in the IMSL MATH/LIBRARY routine FFTCF is used to compute the discrete Fourier transform. All of the frequency domain routines that output a periodogram utilize the fast Fourier transform algorithm.
Centering and Padding. Consider the centered and padded realization
for t = 1, … , N defined by
where N = (n + n0) and
is
Centering the data simplifies the formulas for estimation of the periodogram and spectral density. The addition of n0 = NPAD zeros to the end of the data is called padding. This procedure increases the effective length of the data from n to N in an effort to:
increase the computational efficiency of the Fourier transformation of the series by providing a more suitable series length N (Priestley 1981, page 577).
obtain the periodogram ordinates required to give the exact expression of the sample autocovariances in terms of the inverse Fourier transformation of the periodogram (Priestley 1981, page 579).
produce periodogram ordinates over a more refined range of frequencies ωp.
Any desired filtering, prewhitening, or data tapering should be performed prior to estimating the spectral density. The resulting estimate may be adjusted accordingly.
Periodogram. The periodogram of the sample sequence {Xt}, t = 1, …, n computed with the centered and padded sequence
is defined by
where K is the scale factor
The scale factor of the usual periodogram relates the ordinates to the sum of squares of
(Fuller 1976, pages 276–277). If the first ordinate (corresponding to p = 0) is replaced by one‑half of its value, then if N is odd, the sum of the ⌊N/2⌋ + 1 ordinates corresponding to p = 0, 1, …, ⌊N/2⌋ is
The modified periodogram is an asymptotically unbiased estimate of the nonnormalized spectral density function at each frequency ωp (Priestley 1981, page 417). The argument IPVER is used to specify the version of the periodogram.
Spectral Density. The relationship between the sample autocovariance function and estimate of the nonnormalized spectral density function is similar to the theoretical situation previously discussed.
Define the sample autocovariance function of the Xt process by
where
is given by Equation 2. Note that
is the sample variance. The nonnormalized spectral density may be estimated directly from the sample autocovariances by
The sequence of weights {λn(k)} called the lag window decreases at a rate appropriate for consistent estimation of h(ω).
An algebraically equivalent method of estimating h(ω) consists of locally smoothing the modified periodogram in a neighborhood of ω. Let
denote the modified periodogram of the centered and padded realization
defined in Equation 1. Then, an estimate of the nonnormalized spectral density is given by
where
The spectralwindowWn(θ) is the discrete Fourier transform of the lag window λn(k). We note that for N = 2n‑ 1, the modified periodogram and autocovariances,
form the discrete Fourier transform pair
This relationship is exact and recovers the (n‑ 1) sample autocovariances only when n0 = (n‑ 1) zeros are padded, since then ⌊N/2⌋ = (n‑ 1).
Another method of estimating h(ω) is given by
where
and p(ω) is the integer such that ωp,0 is closest to ω. The sequence of m weights {wj} for j = ‑[m/2], …, (m‑ [m/2] ‑ 1) is fixed in the sense that they do not depend on the frequency, ω, and satisfy Σjwj = 1. Priestley (1981, page 581) notes that if we write
then Equation 4 and Equation 3 are quite similar except that the weights {wj} depend on ω. In fact, if p(ω) = 0 and m = N, these equations are equivalent.
Given estimates
the estimate of the normalized spectral density is given by
This follows directly from the definition of f(ω).
Spectral Window. The following spectral windows Wn(θ) are available in routines containing the argument ISWVER.
Modified Bartlett
where FM (θ) corresponds to the Fejér kernel of order M.
Daniell
Tukey
for 0 < a≤ 0.25, where DM(θ) represents the Dirichlet kernel. The Tukey‑Hanning window is obtained when a = 0.23, and the Tukey‑Hamming window is obtained when a = 0.25.
Parzen
where M is even. If M is odd, then M + 1 is used instead of M in the above formula.
Bartlett-Priestley
The windowparameterM is inversely proportional to the bandwidth of the spectral window. Priestley (1981, pages 520–522) discusses a number of definitions of bandwidth and concludes that the particular definition adopted is of little significance. The choice of spectral window bandwidth, and hence, the choice of M, is a more important problem. One practical choice for M is the last lag at which the estimated autocorrelation function
is significantly different from zero, i.e.,
The estimated autocorrelations and their associated estimated standard errors can be computed using routine ACF. See Priestley (1981, pages 528–556) for alternative strategies of determining the window parameter M.
Since the spectral window is the Fourier transform of the lag window, we estimate the spectral density function by application of a particular spectral window to the periodogram. Note that M is directly related to the rate of decay of the lag window.
Time Interval. Consider the continuous‑parameter stationary process {X(t)} and let {Xt} denote a realization of this process sampled at equal time intervals Δt = TINT. Although the spectral density of X(t) extends over the frequency range (‑π, π), the spectral density of Xt is unique over the restricted frequency range (‑π/Δt, π/Δt). This problem of aliasing or spectrumfolding is inherent to spectral analysis, see Blackman and Tukey (1958) and Priestley (1981) for further discussion.
In practice, the {Xt} realization is treated as a discrete parameter process with spectral density
defined over the frequency range (‑π, π). This corresponds to setting Δt = 1. The transformation of the spectral density to the restricted frequency range (‑π/Δt, π/Δt) is given by
Priestley (1981, pages 507–508) considers a method of choosing Δt. A similar transformation is performed for the estimated spectral density.
Frequency Scale. The argument IFSCAL is used to specify the scale of the frequencies at which to estimate the spectral density. The NF frequencies are contained in the argument F.
Approximate Confidence Intervals for Spectral Ordinates. An approximate (1 ‑α)100% confidence interval for the value of the nonnormalized spectral density function h(ω) at a particular frequency ω is given by the formula (Priestley 1981, page 468)
Routine CHIIN using argument P = 1 ‑α/2 and P = α/2 can be used to compute the percentage point
Also, routine CHIIN should be used with degrees of freedom (DF), which depend upon the version of the spectral window (ISWVER), as given in the following table (Priestley 1981, page 467).
ISWVER
Window
DF
1
Modified Bartlett
3n/M
2
Daniell
2n/M
3
Tukey‑Hamming
2.5164n/M
4
Tukey‑Hanning
2 2/3n/M
5
Parzen
3.708614n/M
6
Bartlett‑Priestley
1.4n/M
If one of the windows above is not specified and the user provides relative weights, such as with routine SWED, the weights are normalized to sum to one in the actual computations. Given all m (m odd) normalized weights wj, then for 2 π⌊m/2⌋/n<ω < π(1 ‑ 2⌊m/2⌋/n) the degrees of freedom for a confidence interval on h(ω) are given by Fuller (1976, page 296)
Frequently, confidence intervals on the ln h(ω) are suggested because this produces fixed width intervals. The interval is
Cross-Spectral Analysis
The routines for cross‑spectral analysis are concerned with the estimation of the crossspectral density of two jointly stationary processes given finite realizations {Xt} and {Yt} for t = 1, …, n. These realizations consist of values sampled at equally spaced time intervals in the continuous‑parameter case or of values observed consecutively in the discrete-parameter case. Again, we develop methodology concerned with the cross‑spectral analysis of discrete‑parameter stationary processes and later account for the time sampling in the continuous‑parameter model.
Let μX and σXX(k) denote the mean and autocovariance function of the Xt process; similarly, define μY and σYY(k), with respect to the Yt process. Define the cross‑covariance function between Xt and Yt by
σXY(k) = cov{[Xt−μX][Yt+k−μY]} k∈ZZ
Then, the nonnormalized cross‑spectral density hXY(ω) and the cross‑covariance function σXY(k) form a Fourier transform pair. The relationship in the continuous‑parameter case is given by
Similarly, the normalized cross‑spectral density fXY(ω) and the cross‑correlation function ρXY(k) = σXY(k)/[ σXX(0)σYY(0)] form a Fourier transform pair. The relationship in the continuous‑parameter case is given by
The discrete‑parameter analogs of the above equations involve summation over k instead of integration over dk.
The cross‑spectral density function is often written in terms of real and imaginary components, since in general, the function is complex‑valued. In particular,
hXY(ω) = cXY(ω) −iqXY(ω)
where the cospectrum and quadraturespectrum of the Xt and Yt process are respectively defined by
The polar form of hXY(ω) is defined by
where the cross‑amplitudespectrum is
and the phasespectrum is
The coherencyspectrum is defined by
For a given frequency ω, the coherency∣wXY(ω)∣ lies between zero and one, inclusive, and reflects the linear relationship between the random coefficients. See Priestley (1981, pages 654–661) for additional information concerning the interpretation of the components of the cross‑spectral density.
Centering and Padding. The centered and padded realizations
are defined as in Equation 1 with centering constants
Any desired filtering, prewhitening, or data tapering should be performed prior to estimating the crossspectral density. The resulting estimate may be adjusted accordingly.
Cross Periodogram. The cross periodogram of the sample sequences {Xt} and {Yt}, t = 1, …, n computed with the padded sequences
t = 1, …, N is defined by
where K is the scale factor
The scale factor option is maintained for compatibility with the spectral routines. The argument IPVER is used to specify the version of the periodogram used to compute the cross periodogram.
Cross‑Spectral Density Estimation. The relationship between the sample cross‑covariance function and estimate of the nonnormalized cross‑spectral density function is similar to the theoretical situation previously discussed.
Define the sample cross‑covariance function between the Xt and Yt process by
The nonnormalized cross‑spectral density may be estimated directly from the sample cross‑covariances by
The sequence of weights {λn(k)} called the lagwindow decreases at a rate appropriate for consistent estimation of hXY(ω).
An algebraically equivalent method of estimating hXY(ω). consists of locally smoothing the modified cross periodogram in a neighborhood of ω. Let
denote the modified cross periodogram of the centered and padded realizations
Then, an estimate of the nonnormalized cross‑spectral density is given by
where Wn(θ) is the spectral window.
Another method of estimating hXY(ω) is given by
where ωp,j, p(ω), and the weights {wj} are as defined in the univariate setting.
Given estimates
the estimate of the normalized cross‑spectral density is given by
This follows directly from the definition of fXY (ω).