regressionArima¶

Fits a univariate ARIMA (p, d, q) time series model with the inclusion of one or more regression variables.

Synopsis¶

regressionArima (nObs, y, model)

Required Arguments¶

int nObs (Input): Number of observations.
float y[] (Input): Array of length nObs containing the observations.
int model[] (Input): Array of length 3 containing the model order parameters p, d, q.

Element	Description
0	Order of the autoregressive part, p, where p ≥ 0.
1	Order of the non-seasonal difference operator, d, where d ≥ 0.
2	Order of the moving average part, q, where q ≥ 0.

Return Value¶

An array of length 1 + p + q with the estimated constant, autoregressive (AR), and moving average (MA) parameters.

Optional Arguments¶

regression, float (Input)

Array of length nObs × nRegressors containing the regression variables. Specific columns of regression may be selected using the optional argument regressionIndices.

If optional arguments forecasts and regression are present, then regressionForecasts is required.

Default: No regression variables are included.

regressionForecasts, float[[]] (Input)

Array of length nPredict × nRegressors containing the regression variables to be used in obtaining forecasts. Specific columns of regressionForecasts may be selected using the optional argument regressionIndices.

If optional arguments forecasts and regression are present, then regressionForecasts is required.

Default: Not used.

regressionIndices, int (Input)

Argument regressionIndices contains the indices of the regression variables in matrices regression and regressionForecasts.

Default: All regression variables in regression and regressionForecasts will be used.

noTrend, (Input)

If noTrend is specified, the function will not include a trend variable. A trend variable has the effect of fitting an intercept term in the regression. If the difference operator model[1] = d > 0, the effect of no trend on the model in the original, undifferenced space is polynomial of order d.

Default: The function will include a trend variable.

maxIterations, int (Input)

Maximum number of iterations.

Default: maxIterations = 50

printLevel, int (Input)

Printing option.

`printLevel`	Action
0	No printing
1	Prints final results only.
2	Prints intermediate and final results.

Default: printLevel = 0

forecasts, int nPredict, float forecasts, float forecastVariances (Output)

Arrays of length nPredict containing the forecasts and forecast variances for time points t = n+1, n+2, …, n+nPredict, where n = nObs.

If optional arguments forecasts and regression are present, then regressionForecasts is required.

regressionCoef (Output): An array of length nRegressors+t containing the estimated regression coefficients, where $t=0$ if noTrend is specified, otherwise $t=1$ .
seArma (Output): An array of length p+q containing the standard errors of the ARMA parameter estimates, where p = model[0] and q = model[2].
varNoise (Output): White noise variance estimate. If model[0]``+model[\ 2\ ``]= 0 and nRegressors > 0, varNoise is the mean squared regression residual.
seCoef (Output): An array of length nRegressors+ t containing the standard errors of the ARMA parameter estimates, where $t=0$ if noTrend is specified, otherwise $t=1$ .
coefCovariances (Output): An array of length (nRegressors+t) × (nRegressors+t) containing the variances and covariances of the regression coefficients, where $t=0$ if noTrend is specified, otherwise $t=1$ .
aic (Output): Akaike’s Information Criterion for the fitted ARMA model.
logLikelihood (Output): Value of –2(ln(likelihood)) for fitted model.

Description¶

Function regressionArima fits an ARIMA(p, d, q) to a univariate time series with the possible inclusion of one or more regression variables.

Suppose $Y_t$ , $t=1,\ldots N$ , is a time series such that the d-th difference is stationary. Further, suppose $a_t$ is a series of uncorrelated, mean 0 random variables with variance $\sigma_a^2$ .

The Auto-Regressive Integrated Moving Average (ARIMA) model for $\left\{ Y_t,a_t \right\}$ can be expressed as

$\phi(B)(1-B)^d Y_t = \theta(B) a_t$

where B is the backshift operator, $Bz_t=z_{t-1},B^2 z_t=z_{t-2}$ ,

$\phi(B) = \left(1 - \phi_1 B - \phi_2 B^2 + \cdot - \phi_p B^p\right),$

and

$\theta(B) = \left(1 - \theta_1 B - \theta_2 B^2 - \cdots - \theta_q B^q\right)$

The notation for this model is ARIMA(p, d, q) where p is the order of the autoregressive polynomial $\phi(B)$ , d is the order of the differencing needed to make $Y_t$ stationary, and q is the order of the moving-average polynomial $\theta(B)$ .

The ARIMA model can be extended to include $K$ regression variables $X_{1t},X_{2t} \ldots,X_{Kt}$ , by using the residuals (of the multiple regression of $Y_t$ on $X_{1t},X_{2t} \ldots,X_{Kt}$ ) in place of $Y_t$ in the above ARIMA model.

$\phi(B)(1-B)^d \left(Y_t - \sum_{i=1}^{K} \beta_i X_{it}\right) = \theta(B) a_t$

Equivalently,

$\phi(B) w_t = \theta(B) a_t$

where

$w_t = (1-B)^d \left(Y_t - \sum_{i=1}^{K} \beta_i X_{it}\right)$

is the differenced residual series.

To estimate the (p + q + K) parameters of the specified regression ARIMA model, regressionArima uses the iterative generalized least squares method (IGLS) as described in Otto, Bell, and Burman (1987).

The IGLS method iterates between two steps, one step to estimate the regression parameters via generalized least squares (GLS) and the second step to estimate the ARMA parameters. In particular, at iteration m, the first step finds

$\hat{\beta}_m = \left(\hat{\beta}_{m1}, \ldots \hat{\beta}_{mK}\right)'$

by solving the GLS problem with weight matrix

$V(i,j) = \gamma_w(j-i), \phantom{...} i < j = 1, \ldots, N$

where

$\gamma_w(j-i) = E\left[w_{t-j} w_{t-i} | \hat{\phi}_{m-1}, \hat{\theta}_{m-1}\right]$

That is, $\hat{\beta}_m$ minimizes $\left( Y-X'\beta\right)'V^{-1} \left( Y-X'\beta\right)$ , where $Y=\left( Y_1,\ldots Y_N \right)'$ , $X$ is an N by K matrix with i-th column, $X_i=\left( X_{i1},\ldots X_{iN} \right)'$ , and $V$ is an N by N weight matrix defined using the theoretical autocovariances of the series $w_{m-1,t}=(1-B)^d \left( Y_t-\sum_{i=1}^{K} \hat{\beta}_{m-1,i} X_{it} \right)$ . The series $w_{m-1,t}$ is modeled as an ARMA(p,q) with parameters $\hat{\phi}_{m-1}=\left( \hat{\phi}_{m-1.1},\ldots\hat{\phi}_{m-1,p} \right)'$ and $\hat{\phi}_{m-1}=\left( \hat{\phi}_{m-1,1},\ldots\hat{\phi}_{m-1,q} \right)'$ . At iteration m, the second step is then to obtain new estimates of $\hat{\varphi}_m$ and $\hat{\theta}_m$ for the updated series, $w_{m,t}$ . To find the estimates $\hat{\varphi}_m$ and $\hat{\theta}_m$ , regressionArima uses the exact likelihood method as described in Akaike, Kitagawa, Arahata and Tada (1979) and used in function, maxArma.

Remarks¶

When forecasts are requested (nPredict > 0), regressionArima requires that future values of the independent variables be provided in optional argument regressionForecasts. In effect, regressionArima assumes the future X’s are known without error, which is valid for any deterministic function of time such as a seasonal indicator. Also, in economics, certain factors that are considered to be leading indicators are treated as deterministic for the purpose of predicting changes in the economy. Users may consider using a more general transfer function model if this is an unreasonable assumption. Function regressionArima calculates forecast variances using the asymptotic result found in Fuller (1996), Theorem 2.9.4. To obtain the standard errors of the ARMA parameters, regressionArima calls function arma for the final w series.

Examples¶

Example 1¶

The data set consists of annual mileage per passenger vehicle and annual US population (in 1000’s) spanning the years 1980 to 2006 (U.S. Energy Information Administration, 2008). Consider modeling the annual mileage using US population as a regression variable.

from numpy import *
from pyimsl.stat.regressionArima import regressionArima
from pyimsl.stat.writeMatrix import writeMatrix

nObs = 24
model = [1, 0, 0]
indices = [0]
nPredict = 5
nRegressors = 2

y = [9062.0, 8813.0, 8873.0, 9050.0, 9118.0,
     9248.0, 9419.0, 9464.0, 9720.0, 9972.0,
     10157.0, 10504.0, 10571.0, 10857.0, 10804.0,
     10992.0, 11203.0, 11330.0, 11581.0, 11754.0,
     11848.0, 11976.0, 11831.0, 12202.0, 12325.0,
     12460.0, 12510.0, 12485.0, 12293.0]

regX = [
    [22722.4681, 9062.0],
    [22946.5714, 8813.0],
    [23166.4458, 8873.0],
    [23379.1990, 9050.0],
    [23582.4902, 9118.0],
    [23792.3795, 9248.0],
    [24013.2887, 9419.0],
    [24228.8918, 9464.0],
    [24449.8982, 9720.0],
    [24681.923, 9972.0],
    [24962.2814, 10157.0],
    [25298.0941, 10504.0],
    [25651.4224, 10571.0],
    [25991.8588, 10857.0],
    [26312.5820999999, 10804.0],
    [26627.8393, 10992.0],
    [26939.4284, 11203.0],
    [27264.6925, 11330.0],
    [27585.4104, 11581.0],
    [27904.0168, 11754.0],
    [28217.1936, 11848.0],
    [28503.9803, 11976.0],
    [28772.6647, 11831.0],
    [29021.0914, 12202.0],
    [29289.2127, 12325.0],
    [29556.0549, 12460.0],
    [29836.2973, 12510.0],
    [30129.0332, 12485.0],
    [30405.9724, 12293.0]]

avar = []
llike = []
regcoef = []
regstderr = []
coefcovar = []
armastderr = []
forecasts = {'nPredict': nPredict}

result = regressionArima(nObs, y, model,
                         regression=regX,
                         regressionForecasts=regX[nObs:][:],
                         forecasts=forecasts,
                         regressionIndices=indices,
                         varNoise=avar,
                         logLikelihood=llike,
                         regressionCoef=regcoef,
                         seCoef=regstderr,
                         coefCovariances=coefcovar,
                         seArma=armastderr,
                         printLevel=1)

Output¶

Final results for regression ARIMA model (p,d,q) = 1, 0, 0

 Final AR parameter estimates/ std errors
          0.73150	        0.13496

-2*ln(maximum log likelihood) = 231.835414

 White noise variance = 10981.682580

 Regression estimates:

	  COEFFICIENTS 	 Regression STD Errors
 0	    -3480.58178	      689.63627
 1	        0.54235	        0.02675

 Forecasts with standard deviation

 T 		 Y fcst 		 Y fcst std dev
24	    12360.33003		      104.79352
25	    12514.49846		      129.83772
26	    12673.39768		      141.42873
27	    12837.21708		      147.25669
28	    12991.11183		      150.28236

Example 2¶

The data set consists of simulated weekly observations containing a strong annual seasonality. The seasonal variables are constructed and sent into regressionArima as regression variables.

from numpy import *
from pyimsl.stat.regressionArima import regressionArima
from pyimsl.math.constant import constant
from pyimsl.stat.writeMatrix import writeMatrix

nObs = 100
nPredict = 4
nRegressors = 2
model = [2, 0, 0]

y = [32.27778, 32.63300, 33.13768, 34.4517,
     34.63824, 37.31262, 37.35704, 37.03092,
     36.39894, 35.75541, 35.10829, 34.70107,
     34.69592, 32.75326, 30.85370, 31.10936,
     29.47493, 29.14361, 28.50466, 30.09714,
     28.49403, 27.23268, 23.49674, 22.71225,
     21.42798, 18.68601, 17.40035, 16.06832,
     15.31862, 14.75179, 13.40089, 13.01101,
     12.44863, 11.27890, 11.51770, 14.31982,
     14.67036, 14.76331, 15.35644, 17.04353,
     18.39931, 18.21919, 18.72777, 19.61794,
     22.31733, 23.79600, 25.41326, 25.60497,
     27.93579, 29.21765, 29.60981, 28.46994,
     28.780810, 30.96402, 35.49537, 35.75124,
     36.18933, 37.2627, 35.02454, 33.57089,
     35.00683, 34.83886, 34.19827, 33.73966,
     34.49709, 34.07127, 32.74709, 31.97856,
     31.3029, 30.21916, 27.46015, 26.78431,
     25.32815, 23.97863, 21.83837, 21.00647,
     20.58846, 19.94578, 17.38271, 17.12572,
     16.71847, 17.45425, 16.15050, 13.07448,
     12.54188, 12.42137, 13.51771, 14.84232,
     14.28870, 13.39561, 15.48938, 16.47175,
     17.62758, 16.57677, 18.20737, 20.8491,
     20.15616, 20.93857, 23.73973, 25.30449,
     26.51106, 29.43261, 32.02672, 32.18846]

# The data are simulated weekly observations
# with an annual seasonal cycle

PI = constant("PI")
x = []
for i in range(0, nObs + nPredict):
    x.append([sin(2 * PI * i / 52.0), cos(2 * PI * i / 52.0)])

forecasts = {'nPredict': nPredict}
avar = []
llike = []
regcoef = []
regstderr = []
coefcovar = []
armastderr = []


result = regressionArima(nObs, y, model,
                         regression=x,
                         regressionForecasts=x[nObs:][:],
                         forecasts=forecasts,
                         varNoise=avar,
                         logLikelihood=llike,
                         regressionCoef=regcoef,
                         seCoef=regstderr,
                         coefCovariances=coefcovar,
                         seArma=armastderr,
                         printLevel=1)

Output¶

Final results for regression ARIMA model (p,d,q) = 2, 0, 0

 Final AR parameter estimates/ std errors
          0.71858	        0.09837
         -0.25992	        0.09828

-2*ln(maximum log likelihood) = -13.620890

 White noise variance = 0.878295

 Regression estimates:

	  COEFFICIENTS 	 Regression STD Errors
 0	       24.81011	        0.17177
 1	        8.91972	        0.24042
 2	        6.84814	        0.24709

 Forecasts with standard deviation

 T 		 Y fcst 		 Y fcst std dev
100	       26.74492		        0.93717
101	       28.07805		        1.15404
102	       29.33707		        1.17880
103	       30.53160		        1.17880