regressionArima¶
Fits a univariate ARIMA (p, d, q) time series model with the inclusion of one or more regression variables.
Synopsis¶
regressionArima (nObs, y, model)
Required Arguments¶
- int
nObs
(Input) - Number of observations.
- float
y[]
(Input) - Array of length
nObs
containing the observations. - int
model[]
(Input) - Array of length 3 containing the model order parameters p, d, q.
Element | Description |
---|---|
0 | Order of the autoregressive part, p, where p ≥ 0. |
1 | Order of the non-seasonal difference operator, d, where d ≥ 0. |
2 | Order of the moving average part, q, where q ≥ 0. |
Return Value¶
An array of length 1 + p + q with the estimated constant, autoregressive (AR), and moving average (MA) parameters.
Optional Arguments¶
regression
, float (Input)Array of length
nObs
×nRegressors
containing the regression variables. Specific columns ofregression
may be selected using the optional argumentregressionIndices
.If optional arguments forecasts
andregression
are present, thenregressionForecasts
is required.Default: No regression variables are included.
regressionForecasts
, float[[]]
(Input)Array of length
nPredict
×nRegressors
containing the regression variables to be used in obtaining forecasts. Specific columns ofregressionForecasts
may be selected using the optional argumentregressionIndices
.If optional arguments forecasts
andregression
are present, thenregressionForecasts
is required.Default: Not used.
regressionIndices
, int (Input)Argument
regressionIndices
contains the indices of the regression variables in matricesregression
andregressionForecasts
.Default: All regression variables in
regression
andregressionForecasts
will be used.noTrend
, (Input)If
noTrend
is specified, the function will not include a trend variable. A trend variable has the effect of fitting an intercept term in the regression. If the difference operatormodel[
1]
= d > 0, the effect of no trend on the model in the original, undifferenced space is polynomial of order d.Default: The function will include a trend variable.
maxIterations
, int (Input)Maximum number of iterations.
Default:
maxIterations
=
50printLevel
, int (Input)Printing option.
printLevel
Action 0 No printing 1 Prints final results only. 2 Prints intermediate and final results. Default:
printLevel
=
0forecasts
, intnPredict
, floatforecasts
, floatforecastVariances
(Output)- Arrays of length
nPredict
containing the forecasts and forecast variances for time points t = n+1, n+2, …, n+nPredict
, where n =nObs
.
If optional arguments forecasts and regression are present,
then regressionForecasts is required. |
regressionCoef
(Output)- An array of length
nRegressors+
t containing the estimated regression coefficients, where \(t=0\) ifnoTrend
is specified, otherwise \(t=1\). seArma
(Output)- An array of length p+q containing the standard errors of the ARMA
parameter estimates, where p =
model[
0]
and q =model[
2]
. varNoise
(Output)- White noise variance estimate. If
model
[0]``+model[\ 2\ ``]
= 0 andnRegressors
> 0,varNoise
is the mean squared regression residual. seCoef
(Output)- An array of length
nRegressors+
t containing the standard errors of the ARMA parameter estimates, where \(t=0\) ifnoTrend
is specified, otherwise \(t=1\). coefCovariances
(Output)- An array of length (
nRegressors+
t) × (nRegressors+
t) containing the variances and covariances of the regression coefficients, where \(t=0\) ifnoTrend
is specified, otherwise \(t=1\). aic
(Output)- Akaike’s Information Criterion for the fitted ARMA model.
logLikelihood
(Output)- Value of –2(ln(likelihood)) for fitted model.
Description¶
Function regressionArima
fits an ARIMA(p, d, q) to a univariate
time series with the possible inclusion of one or more regression variables.
Suppose \(Y_t\), \(t=1,\ldots N\), is a time series such that the d-th difference is stationary. Further, suppose \(a_t\) is a series of uncorrelated, mean 0 random variables with variance \(\sigma_a^2\).
The Auto-Regressive Integrated Moving Average (ARIMA) model for \(\left\{ Y_t,a_t \right\}\) can be expressed as
where B is the backshift operator, \(Bz_t=z_{t-1},B^2 z_t=z_{t-2}\),
and
The notation for this model is ARIMA(p, d, q) where p is the order of the autoregressive polynomial \(\phi(B)\), d is the order of the differencing needed to make \(Y_t\) stationary, and q is the order of the moving-average polynomial \(\theta(B)\).
The ARIMA model can be extended to include \(K\) regression variables \(X_{1t},X_{2t} \ldots,X_{Kt}\), by using the residuals (of the multiple regression of \(Y_t\) on \(X_{1t},X_{2t} \ldots,X_{Kt}\)) in place of \(Y_t\) in the above ARIMA model.
Equivalently,
where
is the differenced residual series.
To estimate the (p + q + K) parameters of the specified regression
ARIMA model, regressionArima
uses the iterative generalized least
squares method (IGLS) as described in Otto, Bell, and Burman (1987).
The IGLS method iterates between two steps, one step to estimate the regression parameters via generalized least squares (GLS) and the second step to estimate the ARMA parameters. In particular, at iteration m, the first step finds
by solving the GLS problem with weight matrix
where
That is, \(\hat{\beta}_m\) minimizes \(\left(
Y-X'\beta\right)'V^{-1} \left( Y-X'\beta\right)\), where \(Y=\left(
Y_1,\ldots Y_N \right)'\), \(X\) is an N by K matrix with i-th
column, \(X_i=\left( X_{i1},\ldots X_{iN} \right)'\), and \(V\) is an
N by N weight matrix defined using the theoretical autocovariances of the
series \(w_{m-1,t}=(1-B)^d \left( Y_t-\sum_{i=1}^{K} \hat{\beta}_{m-1,i}
X_{it} \right)\) . The series \(w_{m-1,t}\) is modeled as an ARMA(p,q)
with parameters \(\hat{\phi}_{m-1}=\left(
\hat{\phi}_{m-1.1},\ldots\hat{\phi}_{m-1,p} \right)'\) and
\(\hat{\phi}_{m-1}=\left( \hat{\phi}_{m-1,1},\ldots\hat{\phi}_{m-1,q}
\right)'\) . At iteration m, the second step is then to obtain new estimates
of \(\hat{\varphi}_m\) and \(\hat{\theta}_m\) for the updated series,
\(w_{m,t}\). To find the estimates \(\hat{\varphi}_m\) and
\(\hat{\theta}_m\), regressionArima
uses the exact likelihood method
as described in Akaike, Kitagawa, Arahata and Tada (1979) and used in
function, maxArma.
Remarks¶
When forecasts are requested (nPredict
> 0), regressionArima
requires that future values of the independent variables be provided in
optional argument regressionForecasts
. In effect, regressionArima
assumes the future X’s are known without error, which is valid for any
deterministic function of time such as a seasonal indicator. Also, in
economics, certain factors that are considered to be leading indicators are
treated as deterministic for the purpose of predicting changes in the
economy. Users may consider using a more general transfer function model if
this is an unreasonable assumption. Function regressionArima
calculates
forecast variances using the asymptotic result found in Fuller (1996),
Theorem 2.9.4. To obtain the standard errors of the ARMA parameters,
regressionArima
calls function arma for the final w
series.
Examples¶
Example 1¶
The data set consists of annual mileage per passenger vehicle and annual US population (in 1000’s) spanning the years 1980 to 2006 (U.S. Energy Information Administration, 2008). Consider modeling the annual mileage using US population as a regression variable.
from numpy import *
from pyimsl.stat.regressionArima import regressionArima
from pyimsl.stat.writeMatrix import writeMatrix
nObs = 24
model = [1, 0, 0]
indices = [0]
nPredict = 5
nRegressors = 2
y = [9062.0, 8813.0, 8873.0, 9050.0, 9118.0,
9248.0, 9419.0, 9464.0, 9720.0, 9972.0,
10157.0, 10504.0, 10571.0, 10857.0, 10804.0,
10992.0, 11203.0, 11330.0, 11581.0, 11754.0,
11848.0, 11976.0, 11831.0, 12202.0, 12325.0,
12460.0, 12510.0, 12485.0, 12293.0]
regX = [
[22722.4681, 9062.0],
[22946.5714, 8813.0],
[23166.4458, 8873.0],
[23379.1990, 9050.0],
[23582.4902, 9118.0],
[23792.3795, 9248.0],
[24013.2887, 9419.0],
[24228.8918, 9464.0],
[24449.8982, 9720.0],
[24681.923, 9972.0],
[24962.2814, 10157.0],
[25298.0941, 10504.0],
[25651.4224, 10571.0],
[25991.8588, 10857.0],
[26312.5820999999, 10804.0],
[26627.8393, 10992.0],
[26939.4284, 11203.0],
[27264.6925, 11330.0],
[27585.4104, 11581.0],
[27904.0168, 11754.0],
[28217.1936, 11848.0],
[28503.9803, 11976.0],
[28772.6647, 11831.0],
[29021.0914, 12202.0],
[29289.2127, 12325.0],
[29556.0549, 12460.0],
[29836.2973, 12510.0],
[30129.0332, 12485.0],
[30405.9724, 12293.0]]
avar = []
llike = []
regcoef = []
regstderr = []
coefcovar = []
armastderr = []
forecasts = {'nPredict': nPredict}
result = regressionArima(nObs, y, model,
regression=regX,
regressionForecasts=regX[nObs:][:],
forecasts=forecasts,
regressionIndices=indices,
varNoise=avar,
logLikelihood=llike,
regressionCoef=regcoef,
seCoef=regstderr,
coefCovariances=coefcovar,
seArma=armastderr,
printLevel=1)
Output¶
Final results for regression ARIMA model (p,d,q) = 1, 0, 0
Final AR parameter estimates/ std errors
0.73150 0.13496
-2*ln(maximum log likelihood) = 231.835414
White noise variance = 10981.682580
Regression estimates:
COEFFICIENTS Regression STD Errors
0 -3480.58178 689.63627
1 0.54235 0.02675
Forecasts with standard deviation
T Y fcst Y fcst std dev
24 12360.33003 104.79352
25 12514.49846 129.83772
26 12673.39768 141.42873
27 12837.21708 147.25669
28 12991.11183 150.28236
Example 2¶
The data set consists of simulated weekly observations containing a strong
annual seasonality. The seasonal variables are constructed and sent into
regressionArima
as regression variables.
from numpy import *
from pyimsl.stat.regressionArima import regressionArima
from pyimsl.math.constant import constant
from pyimsl.stat.writeMatrix import writeMatrix
nObs = 100
nPredict = 4
nRegressors = 2
model = [2, 0, 0]
y = [32.27778, 32.63300, 33.13768, 34.4517,
34.63824, 37.31262, 37.35704, 37.03092,
36.39894, 35.75541, 35.10829, 34.70107,
34.69592, 32.75326, 30.85370, 31.10936,
29.47493, 29.14361, 28.50466, 30.09714,
28.49403, 27.23268, 23.49674, 22.71225,
21.42798, 18.68601, 17.40035, 16.06832,
15.31862, 14.75179, 13.40089, 13.01101,
12.44863, 11.27890, 11.51770, 14.31982,
14.67036, 14.76331, 15.35644, 17.04353,
18.39931, 18.21919, 18.72777, 19.61794,
22.31733, 23.79600, 25.41326, 25.60497,
27.93579, 29.21765, 29.60981, 28.46994,
28.780810, 30.96402, 35.49537, 35.75124,
36.18933, 37.2627, 35.02454, 33.57089,
35.00683, 34.83886, 34.19827, 33.73966,
34.49709, 34.07127, 32.74709, 31.97856,
31.3029, 30.21916, 27.46015, 26.78431,
25.32815, 23.97863, 21.83837, 21.00647,
20.58846, 19.94578, 17.38271, 17.12572,
16.71847, 17.45425, 16.15050, 13.07448,
12.54188, 12.42137, 13.51771, 14.84232,
14.28870, 13.39561, 15.48938, 16.47175,
17.62758, 16.57677, 18.20737, 20.8491,
20.15616, 20.93857, 23.73973, 25.30449,
26.51106, 29.43261, 32.02672, 32.18846]
# The data are simulated weekly observations
# with an annual seasonal cycle
PI = constant("PI")
x = []
for i in range(0, nObs + nPredict):
x.append([sin(2 * PI * i / 52.0), cos(2 * PI * i / 52.0)])
forecasts = {'nPredict': nPredict}
avar = []
llike = []
regcoef = []
regstderr = []
coefcovar = []
armastderr = []
result = regressionArima(nObs, y, model,
regression=x,
regressionForecasts=x[nObs:][:],
forecasts=forecasts,
varNoise=avar,
logLikelihood=llike,
regressionCoef=regcoef,
seCoef=regstderr,
coefCovariances=coefcovar,
seArma=armastderr,
printLevel=1)
Output¶
Final results for regression ARIMA model (p,d,q) = 2, 0, 0
Final AR parameter estimates/ std errors
0.71858 0.09837
-0.25992 0.09828
-2*ln(maximum log likelihood) = -13.620890
White noise variance = 0.878295
Regression estimates:
COEFFICIENTS Regression STD Errors
0 24.81011 0.17177
1 8.91972 0.24042
2 6.84814 0.24709
Forecasts with standard deviation
T Y fcst Y fcst std dev
100 26.74492 0.93717
101 28.07805 1.15404
102 29.33707 1.17880
103 30.53160 1.17880