autoArima¶
Automatically identifies time series outliers, determines parameters of a multiplicative seasonal ARIMA \((p,0,q)\times(0,d,0)_s\) model and produces forecasts that incorporate the effects of outliers whose effects persist beyond the end of the series.
Synopsis¶
autoArima (tpoints, x)
Required Arguments¶
- int
tpoints[]
(Input) - A vector of length
nObs
containing the time points \(t_1,t_2,\ldots,t_{\mathit{n\_obs}}\) the time series was observed. It is required that \(t_1,t_2,\ldots,t_{\mathit{n\_obs}}\) are in strictly ascending order. - float
x[]
(Input) - A vector of length
nObs
containing the observed time series values \(Y_1^*,Y_2^*,\cdots,Y_{\mathit{n\_obs}}^*\). This series can contain outliers and missing observations. Outliers are identified by this function and missing values are identified by the time values in vectortpoints
. If the time interval between two consecutive time points is greater than one, i.e. \(t_{i+1}-t_i=m>1\), then \(m-1\) missing values are assumed to exist between \(t_i\) and \(t_{i+1}\) at times \(t_i+1,t_i+2,\ldots,t_{i+1}-1\). Therefore, the gap free series is assumed to be defined for equidistant time points . Missing values are automatically estimated prior to identifying outliers and producing forecasts. Forecasts are generated for both missing and observed values.
Return Value¶
An array of length 1 + p + q with the estimated constant, AR and MA
parameters used to fit the outlier-free series using an \(\text{ARIMA}
(p,0,q)\times(0,d,0)_s\) model. Upon completion, if d=model[3]=0
, then
an \(\text{ARMA}(p,q)\) model or \(\text{AR}(p)\) model is fitted to
the outlier-free version of the observed series \(Y_t^*\). If
d=model[3]>0
, these parameters are computed for an
\(\text{ARMA}(p,q)\) representation of the seasonally adjusted series
\(Z_t^*=\mathit{\Delta}_s^d\cdot Y_t^*=\left(1-B_s\right)^d\cdot Y_t^*\),
where \(B_s Y_t^*=Y_{t-s}^*\) and s=model[
2]
≥1.
If an error occurred, None
is returned.
Optional Arguments¶
method
, int (Input)The method used in model selection:
method Description 1 Automatic \(\text{ARIMA}(p,0,0)\times(0,d,0)_s\) selection. 2 Grid search. Requires arguments pInitial
andqInitial
.3 Specified \(\text{ARIMA}(p,0,q)\times(0,d,0)_s\) model. Requires argument model
.For more information, see the “Description” section.
Default:
method
= 1maxLag
, int (Input)The maximum lag allowed when fitting an AR(p
)
model.Default:
maxLag = 10
model
, int[]
(Input/Output)- Array of length 4 containing the values for p, q, s, d. If
method
= 3 is chosen, then the values for p and q must be defined. IfsInitial
anddInitial
are not defined, then also s and d must be given. Ifmethod
= 1 ormethod
= 2, thenmodel
is ignored as an input array. On output,model
contains the optimum values for p, q, s, d inmodel[0]
,model[1]
,model[2]
andmodel[3]
, respectively. delta
, float (Input)The dampening effect parameter used in the detection of a Temporary Change Outlier (TC),
0<delta<1
.Default:
delta =
0.7critical
, float (Input)Critical value used as a threshold for outlier detection,
critical >
0.Default:
critical =
3.0epsilon
, float (Input)Positive tolerance value controlling the accuracy of parameter estimates during outlier detection.
Default:
epsilon
= 0.001residual
(Output)- An array of length \(n=t_{\mathit{n\_obs}}-t_1+1\geq\mathit{n\_obs}\), containing \(\hat{e}_t\), the estimates of the white noise in the outlier free original series.
residualSigma
(Output)- Residual standard error (RSE) of the outlier free original series.
numOutliers
(Output)- The number of outliers detected.
pInitial
, int[]
(Input)- An array with
nPInitial
elements containing the candidate values for p, from which the optimum is being selected. All candidate values inpInitial[]
must be non-negative andnPInitial
≥ 1. Ifmethod=
2, thenpInitial
must be defined. Otherwise,nPInitial
andpInitial
are ignored. qInitial
, int[]
(Input)- An array with
nQInitial
elements containing the candidate values for q, from which the optimum is being selected. All candidate values inqInitial[]
must be non-negative andnQInitial
≥ 1. Ifmethod=
2, thenqInitial
must be defined. Otherwise,nQInitial
andqInitial
are ignored. sInitial
, int[]
(Input)A vector of length
nSInitial
containing the candidate values fors
, from which the optimum is being selected. All candidate values insInitial[]
must be positive andnSInitial
≥ 1.Default:
sInitial=
[1]dInitial
, int[]
(Input)A vector of length
nDInitial
containing the candidate values for d, from which the optimum is being selected. All candidate values indInitial[]
must be non-negative andnDInitial
≥ 1.Default:
dInitial=
[0]outlierStatistics
(Output)An array of length
numOutliers
by 2 containing outlier statistics. The first column contains the time at which the outlier was observed (\(t=t_1,t_1+1,t_1+2,\ldots,t_{\mathit{n\_obs}}\)) and the second column contains an identifier indicating the type of outlier observed. Outlier types fall into one of five categories:0 Innovational Outliers (IO) 1 Additive Outliers (AO) 2 Level Shift Outliers (LS) 3 Temporary Change Outliers (TC) 4 Unable to Identify (UI). If
numOutliers
= 0,None
is returned.aic
(Output)- The AIC (Akaike’s Information Criterion) value for the optimum model. Uses an approximation of the maximum log-likelihood based on an estimate of the innovation variance of the series.
aicc
(Output)- The AICC (corrected AIC) value for the optimum model. Uses an approximation of the maximum log-likelihood based on an estimate of the innovation variance of the series.
bic
(Output)- The BIC (Bayesian Information Criterion) value for the optimum model. Uses an approximation of the maximum log-likelihood based on an estimate of the innovation variance of the series.
modelSelectionCriterion
, int (Input)The information criterion used for optimum model selection.
criterion selected information criterion 0 Akaike’s Information Criterion (AIC) 1 Akaike’s Corrected Information Criterion (AICC) 2 Bayesian Information Criterion (BIC) Default:
modelSelectionCriterion
= 0.
outFreeSeries
(Output)- An array of length n by 2, where \(n=t_{\mathit{n\_obs}}-t_1+1\).
The first column of
outFreeSeries
contains thenObs
observations from the original series, \(Y_t^*\), plus estimated values for any time gaps. The second column contains the same values as the first column adjusted by removing any outlier effects. In effect, the second column contains estimates of the underlying outlier-free series, \(Y_t\). If no outliers are detected then both columns will contain identical values. confidence
, float (Input)Confidence level for computing forecast confidence limits, taken from the exclusive interval (0, 100). Typical choices for
confidence
are 90.0, 95.0 and 99.0.Default:
confidence
= 95.0numPredict
, int (Input)The number of forecasts requested. Forecasts are made at origin \(t_{n\_obs}\), i.e. from the last observed value of the series.
Default:
numPredict
= 0outFreeForecast
(Output)- An array of length
numPredict
by 3. The first column contains the forecasted values for the original outlier free series for t = \(t_{n\_obs}+1\), \(t_{n\_obs}+2\), …, \(t_{n\_obs}\) +numPredict
. The second column contains standard errors for these forecasts, and the third column contains the psi weights of the infinite order moving average form of the model. outlierForecast
(Output)- An array of length
numPredict
by 3. The first column contains the forecasted values for the original series for t = \(t_{n\_obs}+1\), \(t_{n\_obs}+2\), …, \(t_{n\_obs}\)+numPredict
. The second column contains standard errors for these forecasts, and the third column contains the \(\psi\) weights of the infinite order moving average form of the model. supplyWorkArrays
, intiWork[]
, floatwork[]
, (Input/Output)The use of this optional argument will increase efficiency and avoid memory fragmentation run-time failures for large problems by allowing the user to provide the sizes and locations of the working arrays
work
andiWork
. It is also useful if many time series have to be processed sequentially because it can significantly reduce the amount of memory that has to be reallocated. This optional argument can be used in conjunction withmethod
=
1 andmethod =
2. Withmaxt
as the maximum number of threads that will be active andnObsAct
the length of the time series (including missing values), it is required thatlen(iWork)
≥ 2 ×maxt
× (2+ nObsAct
).The minimum length of array
work
depends on the choice of the method. For method 1, it is required thatlen(iWork)
≥maxt
× (3
×nObsAct + 1 + maxLag
).Method 2 requires
len(iWork)
≥maxt
× (3
×nObsAct + 1 + ub_p + ub_q
),where
ub_p
andub_q
denote the maximum values in arrayspInitial
andqInitial
, respectively.
Description¶
Function autoArima
determines the parameters of a multiplicative seasonal
\(\text{ARIMA}(p,0,q)\times(0,d,0)_s\) model, and then uses the fitted
model to identify outliers and prepare forecasts. The order of this model can
be specified or automatically determined.
The \(\text{ARIMA}(p,0,q)\times(0,d,0)_s\) model handled by autoArima
has the following form:
where
and
It is assumed that all roots of \(\phi(B)\) and \(\theta(B)\) lie outside the unit circle. Clearly, if \(s=1\) this reduces to the traditional \(\text{ARIMA}(p,d,q)\) model.
\(Y_t\) is the unobserved, outlier-free time series with mean
\(\mu\), and white noise \(a_t\). This model is referred to as the
underlying, outlier-free model. Function autoArima
does not assume that
this series is observable. It assumes that the observed values might be
contaminated by one or more outliers, whose effects are added to the
underlying outlier-free series:
Outlier identification uses the algorithm developed by Chen and Liu (1993). Outliers are classified into 1 of 5 types:
- innovational
- additive
- level shift
- temporary change and
- unable to identify
Once outliers are identified, autoArima
estimates \(Y_t\), the
outlier-free series representation of the data, by removing the estimated
outlier effects.
Using the information about the adjusted \(\text{ARIMA}(p,0,q)\times(0, d,0)_s\) model and the removed outliers, forecasts are then prepared for the outlier-free series. Outlier effects are added to these forecasts to produce a forecast for the observed series, \(Y_t^*\). If there are no outliers, then the forecasts for the outlier-free series and the observed series will be identical.
Model Selection¶
Users have an option of either specifying specific values for p, q, s and
d or have autoArima
automatically select best fit values. Model
selection can be conducted in one of three methods listed below depending
upon the value of variable method
.
Method 1: Automatic \(\text{ARIMA}(p,0,0)\times(0,d,0)_s\) Selection¶
This method initially searches for the AR(p) representation with minimum AIC
for the noisy data, where p = 0,…, maxLag
.
If dInitial
is defined then the values in sInitial
and dInitial
are included in the search to find an optimum
\(\text{ARIMA}(p,0,0)\times (0,d,0)_s\) representation of the series.
Here, every possible combination of values for p, s in sInitial
and d
in dInitial
is examined. The best found
\(\text{ARIMA}(p,0,0)\times(0,d,0)_s\) representation is then used as
input for the outlier detection routine.
The optimum values for p, q, s and d are returned in
model[
0]
, model[
1]
, model[
2]
and
model[
3]
, respectively.
Method 2: Grid Search¶
The second automatic method conducts a grid search for p and q using all
possible combinations of candidate values in pInitial
and qInitial
.
Therefore, for this method the definition of pInitial
and qInitial
is required.
If dInitial
is defined, the grid search is extended to include the
candidate values for s
and d
given in sInitial
and dInitial
,
respectively.
If dInitial
is not defined, no seasonal adjustment is attempted, and the
grid search is restricted to searching for optimum values of p and q
only.
The optimum values of p, q, s and d are returned in
model[
0]
, model[
1]
, model[
2]
and
model[
3]
, respectively.
Method 3: Specified \(\text{ARIMA}(p,0,q)\times(0,d,0)_s\) Model¶
In the third method, specific values for p, q, s and d are given. The
values for p and q must be defined in model[0]
and model[1]
,
respectively. If sInitial
and dInitial
are not defined, then values
\(s>0\) and \(d\geq 0\) must be specified in model[
2]
and
model[3]
. If
sInitial
and dInitial
are defined, then a grid
search for the optimum values of s and d is conducted using all possible
combinations of input values in sInitial
and dInitial
. The optimum
values of s and d can be found in model[2]
and model[3]
,
respectively.
Outliers¶
The algorithm of Chen and Liu (1993) is used to identify outliers. The
number of outliers identified is returned in numOutliers
. Both the time
and classification for these outliers are returned in
outlierStatistics[]
. Outliers are classified into one of five categories
based upon the standardized statistic for each outlier type. The time at
which the outlier occurred is given in the first column of
outlierStatistics
. The outlier identifier returned in the second column
is according to the descriptions in the following table:
Outlier Identifier | Name | General Description |
---|---|---|
0 | (IO) Innovational Outlier |
Innovational outliers persist. That is, there is an initial impact at the time the outlier occurs. This effect continues in a lagged fashion with all future observations. The lag coefficients are determined by the coefficient of the underlying \(\text{ARIMA}(p,0,q)\times (0,d,0)_s\) model. |
1 | (AO) Additive Outlier |
Additive outliers do not persist. As the name implies, an additive outlier effects only the observation at the time the outlier occurs. Hence additive outliers have no effect on future forecasts. |
2 | (LS) Level Shift |
Level shift outliers persist. They have the effect of either raising or lowering the mean of the series starting at the time the outlier occurs. This shift in the mean is abrupt and permanent. |
3 | (TC) Temporary Change |
Temporary change outliers persist and
are similar to level shift outliers
with one major exception. Like level
shift outliers, there is an abrupt
change in the mean of the series at the
time this outlier occurs. However,
unlike level shift outliers, this shift
is not permanent. The TC outlier
gradually decays, eventually bringing
the mean of the series back to its
original value. The rate of this decay
is modeled using the parameter
delta . The default of delta=
0.7 is the value recommended for
general use by Chen and Liu (1993). |
4 | (UI) Unable to Identify |
If an outlier is identified as the last observation, then the algorithm is unable to determine the outlier’s classification. For forecasting, a UI outlier is treated as an IO outlier. That is, its effect is lagged into the forecasts. |
Except for additive outliers (AO), the effect of an outlier persists to
observations following that outlier. Forecasts produced by autoArima
take this into account.
Examples¶
Example 1¶
This example uses time series D from Box, Jenkins and Reinsel (1994), the
hourly viscosity readings of a chemical process. Method 1 without seasonal adjustment is chosen to find an
appropriate AR(p) model for the first 304 observations of this series,
measured at time points \(t=1\) to \(t=304\). A forecast is then done
at origin \(t=304\) for lead times 1 to 6 and compared with the actual
time series values which are stored in array actual
.
from __future__ import print_function
from numpy import *
from pyimsl.stat.autoArima import autoArima
from pyimsl.stat.writeMatrix import writeMatrix
num_outliers = []
aic = []
res_sigma = []
model = []
outlier_stat = empty(0)
outlier_forecast = empty(0)
forecast_table = empty((6, 4))
x = array([
8.0, 8.0, 7.4, 8.0, 8.0, 8.0, 8.0, 8.8, 8.4, 8.4, 8.0, 8.2, 8.2, 8.2, 8.4,
8.4, 8.4, 8.6, 8.8, 8.6, 8.6, 8.6, 8.6, 8.6, 8.8, 8.9, 9.1, 9.5, 8.5, 8.4,
8.3, 8.2, 8.1, 8.3, 8.4, 8.7, 8.8, 8.8, 9.2, 9.6, 9.0, 8.8, 8.6, 8.6, 8.8,
8.8, 8.6, 8.6, 8.4, 8.3, 8.4, 8.3, 8.3, 8.1, 8.2, 8.3, 8.5, 8.1, 8.1, 7.9,
8.3, 8.1, 8.1, 8.1, 8.4, 8.7, 9.0, 9.3, 9.3, 9.5, 9.3, 9.5, 9.5, 9.5, 9.5,
9.5, 9.5, 9.9, 9.5, 9.7, 9.1, 9.1, 8.9, 9.3, 9.1, 9.1, 9.3, 9.5, 9.3, 9.3,
9.3, 9.9, 9.7, 9.1, 9.3, 9.5, 9.4, 9.0, 9.0, 8.8, 9.0, 8.8, 8.6, 8.6, 8.0,
8.0, 8.0, 8.0, 8.6, 8.0, 8.0, 8.0, 7.6, 8.6, 9.6, 9.6, 10.0, 9.4, 9.3, 9.2,
9.5, 9.5, 9.5, 9.9, 9.9, 9.5, 9.3, 9.5, 9.5, 9.1, 9.3, 9.5, 9.3, 9.1, 9.3,
9.1, 9.5, 9.4, 9.5, 9.6, 10.2, 9.8, 9.6, 9.6, 9.4, 9.4, 9.4, 9.4, 9.6, 9.6,
9.4, 9.4, 9.0, 9.4, 9.4, 9.6, 9.4, 9.2, 8.8, 8.8, 9.2, 9.2, 9.6, 9.6, 9.8,
9.8, 10.0, 10.0, 9.4, 9.8, 8.8, 8.8, 8.8, 8.8, 9.6, 9.6, 9.6, 9.2, 9.2, 9.0,
9.0, 9.0, 9.4, 9.0, 9.0, 9.4, 9.4, 9.6, 9.4, 9.6, 9.6, 9.6, 10.0, 10.0, 9.6,
9.2, 9.2, 9.2, 9.0, 9.0, 9.6, 9.8, 10.2, 10.0, 10.0, 10.0, 9.4, 9.2, 9.6, 9.7,
9.7, 9.8, 9.8, 9.8, 10.0, 10.0, 8.6, 9.0, 9.4, 9.4, 9.4, 9.4, 9.4, 9.6, 10.0,
10.0, 9.8, 9.8, 9.7, 9.6, 9.4, 9.2, 9.0, 9.4, 9.6, 9.6, 9.6, 9.6, 9.6, 9.6,
9.0, 9.4, 9.4, 9.4, 9.6, 9.4, 9.6, 9.6, 9.8, 9.8, 9.8, 9.6, 9.2, 9.6, 9.2,
9.2, 9.6, 9.6, 9.6, 9.6, 9.6, 9.6, 10.0, 10.0, 10.4, 10.4, 9.8, 9.0, 9.6, 9.8,
9.6, 8.6, 8.0, 8.0, 8.0, 8.0, 8.4, 8.8, 8.4, 8.4, 9.0, 9.0, 9.4, 10.0, 10.0,
10.0, 10.2, 10.0, 10.0, 9.6, 9.0, 9.0, 8.6, 9.0, 9.6, 9.6, 9.0, 9.0, 8.9, 8.8,
8.7, 8.6, 8.3, 7.9])
times = array([0] * len(x))
for i in range(0, len(times)):
times[i] = i + 1
# Actual values of series D at time points t=305,...,t=310
actual = array([8.5, 8.7, 8.9, 9.1, 9.1, 9.1])
n_predict = 6
n_obs = len(x)
parameters = autoArima(times, x,
model=model,
aic=aic,
critical=3.8,
maxLag=5,
numOutliers=num_outliers,
outlierStatistics=outlier_stat,
residualSigma=res_sigma,
numPredict=n_predict,
outlierForecast=outlier_forecast)
print("Method 1: Automatic ARIMA model selection")
print("\nModel chosen: p=%d, q=%d, s=%d, d=%d" %
(model[0], model[1], model[2], model[3]))
print("\nNumber of outliers: %d" % num_outliers[0])
print("\nOutlier statistics:")
print("\nTime point\tOutlier type")
for i in range(0, num_outliers[0]):
print("%d\t\t%d" % (outlier_stat[i, 0], outlier_stat[i, 1]))
print("\nAIC = %lf" % aic[0])
print("RSE = %lf" % res_sigma[0])
print("\nParameters:")
for i in range(0, int(model[0] + model[1] + 1)):
print("parameters[%d]=%lf" % (i, parameters[i]))
for i in range(0, n_predict):
forecast_table[i, 0] = actual[i]
forecast_table[i, 1] = outlier_forecast[i, 0]
forecast_table[i, 2] = outlier_forecast[i, 1]
forecast_table[i, 3] = outlier_forecast[i, 2]
writeMatrix("\t* * * Forecast Table * * *"
"\nOrig. series\t forecast\tprob. limits\tpsi weights",
forecast_table, writeFormat="%11.4f")
Output¶
Method 1: Automatic ARIMA model selection
Model chosen: p=1, q=0, s=1, d=0
Number of outliers: 1
Outlier statistics:
Time point Outlier type
217 3
AIC = 678.224720
RSE = 0.290680
Parameters:
parameters[0]=1.044163
parameters[1]=0.887724
* * * Forecast Table * * *
Orig. series forecast prob. limits psi weights
1 2 3 4
1 8.5000 8.0572 0.5697 0.8877
2 8.7000 8.1967 0.7618 0.7881
3 8.9000 8.3206 0.8843 0.6996
4 9.1000 8.4306 0.9699 0.6210
5 9.1000 8.5282 1.0325 0.5513
6 9.1000 8.6148 1.0792 0.4894
Example 2¶
This is the same as Example 1, except now
autoArima
uses Method 2 with a possible
seasonal adjustment. As a result, the unadjusted model with
\(p=3,q=1,s=1,d=0\) is chosen as optimum.
from __future__ import print_function
from numpy import *
from pyimsl.stat.autoArima import autoArima
from pyimsl.stat.writeMatrix import writeMatrix
num_outliers = []
aic = []
res_sigma = []
model = []
s_initial = (1, 2)
d_initial = (0, 1, 2)
p_initial = (0, 1, 2, 3)
q_initial = (0, 1, 2, 3)
outlier_stat = empty(0)
outlier_forecast = empty(0)
forecast_table = empty((6, 4))
x = array([
8.0, 8.0, 7.4, 8.0, 8.0, 8.0, 8.0, 8.8, 8.4, 8.4, 8.0, 8.2, 8.2, 8.2, 8.4,
8.4, 8.4, 8.6, 8.8, 8.6, 8.6, 8.6, 8.6, 8.6, 8.8, 8.9, 9.1, 9.5, 8.5, 8.4,
8.3, 8.2, 8.1, 8.3, 8.4, 8.7, 8.8, 8.8, 9.2, 9.6, 9.0, 8.8, 8.6, 8.6, 8.8,
8.8, 8.6, 8.6, 8.4, 8.3, 8.4, 8.3, 8.3, 8.1, 8.2, 8.3, 8.5, 8.1, 8.1, 7.9,
8.3, 8.1, 8.1, 8.1, 8.4, 8.7, 9.0, 9.3, 9.3, 9.5, 9.3, 9.5, 9.5, 9.5, 9.5,
9.5, 9.5, 9.9, 9.5, 9.7, 9.1, 9.1, 8.9, 9.3, 9.1, 9.1, 9.3, 9.5, 9.3, 9.3,
9.3, 9.9, 9.7, 9.1, 9.3, 9.5, 9.4, 9.0, 9.0, 8.8, 9.0, 8.8, 8.6, 8.6, 8.0,
8.0, 8.0, 8.0, 8.6, 8.0, 8.0, 8.0, 7.6, 8.6, 9.6, 9.6, 10.0, 9.4, 9.3, 9.2,
9.5, 9.5, 9.5, 9.9, 9.9, 9.5, 9.3, 9.5, 9.5, 9.1, 9.3, 9.5, 9.3, 9.1, 9.3,
9.1, 9.5, 9.4, 9.5, 9.6, 10.2, 9.8, 9.6, 9.6, 9.4, 9.4, 9.4, 9.4, 9.6, 9.6,
9.4, 9.4, 9.0, 9.4, 9.4, 9.6, 9.4, 9.2, 8.8, 8.8, 9.2, 9.2, 9.6, 9.6, 9.8,
9.8, 10.0, 10.0, 9.4, 9.8, 8.8, 8.8, 8.8, 8.8, 9.6, 9.6, 9.6, 9.2, 9.2, 9.0,
9.0, 9.0, 9.4, 9.0, 9.0, 9.4, 9.4, 9.6, 9.4, 9.6, 9.6, 9.6, 10.0, 10.0, 9.6,
9.2, 9.2, 9.2, 9.0, 9.0, 9.6, 9.8, 10.2, 10.0, 10.0, 10.0, 9.4, 9.2, 9.6, 9.7,
9.7, 9.8, 9.8, 9.8, 10.0, 10.0, 8.6, 9.0, 9.4, 9.4, 9.4, 9.4, 9.4, 9.6, 10.0,
10.0, 9.8, 9.8, 9.7, 9.6, 9.4, 9.2, 9.0, 9.4, 9.6, 9.6, 9.6, 9.6, 9.6, 9.6,
9.0, 9.4, 9.4, 9.4, 9.6, 9.4, 9.6, 9.6, 9.8, 9.8, 9.8, 9.6, 9.2, 9.6, 9.2,
9.2, 9.6, 9.6, 9.6, 9.6, 9.6, 9.6, 10.0, 10.0, 10.4, 10.4, 9.8, 9.0, 9.6, 9.8,
9.6, 8.6, 8.0, 8.0, 8.0, 8.0, 8.4, 8.8, 8.4, 8.4, 9.0, 9.0, 9.4, 10.0, 10.0,
10.0, 10.2, 10.0, 10.0, 9.6, 9.0, 9.0, 8.6, 9.0, 9.6, 9.6, 9.0, 9.0, 8.9, 8.8,
8.7, 8.6, 8.3, 7.9])
times = array([0] * len(x))
for i in range(0, len(times)):
times[i] = i + 1
# Actual values of series D at time points t=305,...,t=310
actual = array([8.5, 8.7, 8.9, 9.1, 9.1, 9.1])
n_predict = 6
n_obs = len(x)
parameters = autoArima(times, x,
model=model,
aic=aic,
critical=3.8,
maxLag=5,
method=2,
pInitial=p_initial,
qInitial=q_initial,
sInitial=s_initial,
dInitial=d_initial,
numOutliers=num_outliers,
outlierStatistics=outlier_stat,
residualSigma=res_sigma,
numPredict=n_predict,
outlierForecast=outlier_forecast)
print("Method 2: Grid search, differencing allowed")
print("\nModel chosen: p=%d, q=%d, s=%d, d=%d" %
(model[0], model[1], model[2], model[3]))
print("\nNumber of outliers: %d" % num_outliers[0])
print("\nOutlier statistics:")
print("\nTime point\tOutlier type")
for i in range(0, num_outliers[0]):
print("%d\t\t%d" % (outlier_stat[i, 0], outlier_stat[i, 1]))
print("\nAIC = %lf" % aic[0])
print("RSE = %lf" % res_sigma[0])
print("\nParameters:")
for i in range(0, int(model[0] + model[1] + 1)):
print("parameters[%d]=%lf" % (i, parameters[i]))
for i in range(0, n_predict):
forecast_table[i, 0] = actual[i]
forecast_table[i, 1] = outlier_forecast[i, 0]
forecast_table[i, 2] = outlier_forecast[i, 1]
forecast_table[i, 3] = outlier_forecast[i, 2]
writeMatrix("\t* * * Forecast Table * * *"
"\nOrig. series\t forecast\tprob. limits\tpsi weights",
forecast_table, writeFormat="%11.4f")
Output¶
Method 2: Grid search, differencing allowed
Model chosen: p=3, q=1, s=1, d=0
Number of outliers: 1
Outlier statistics:
Time point Outlier type
217 3
AIC = 675.886035
RSE = 0.286720
Parameters:
parameters[0]=1.892687
parameters[1]=0.184446
parameters[2]=0.641219
parameters[3]=-0.029179
parameters[4]=-0.742956
* * * Forecast Table * * *
Orig. series forecast prob. limits psi weights
1 2 3 4
1 8.5000 8.0471 0.5620 0.9274
2 8.7000 8.2004 0.7664 0.8123
3 8.9000 8.3347 0.8921 0.7153
4 9.1000 8.4534 0.9784 0.6257
5 9.1000 8.5570 1.0397 0.5504
6 9.1000 8.6483 1.0847 0.4819
Example 3¶
This example is the same as Example 2 but now Method 3 with the optimum model parameters \(p=3,q=1,s=1,d=0\) from Example 2 are chosen for outlier detection and forecasting.
from __future__ import print_function
from numpy import *
from pyimsl.stat.autoArima import autoArima
from pyimsl.stat.writeMatrix import writeMatrix
num_outliers = []
aic = []
res_sigma = []
model = []
outlier_stat = empty(0)
outlier_forecast = empty(0)
forecast_table = empty((6, 4))
x = array([
8.0, 8.0, 7.4, 8.0, 8.0, 8.0, 8.0, 8.8, 8.4, 8.4, 8.0, 8.2, 8.2, 8.2, 8.4,
8.4, 8.4, 8.6, 8.8, 8.6, 8.6, 8.6, 8.6, 8.6, 8.8, 8.9, 9.1, 9.5, 8.5, 8.4,
8.3, 8.2, 8.1, 8.3, 8.4, 8.7, 8.8, 8.8, 9.2, 9.6, 9.0, 8.8, 8.6, 8.6, 8.8,
8.8, 8.6, 8.6, 8.4, 8.3, 8.4, 8.3, 8.3, 8.1, 8.2, 8.3, 8.5, 8.1, 8.1, 7.9,
8.3, 8.1, 8.1, 8.1, 8.4, 8.7, 9.0, 9.3, 9.3, 9.5, 9.3, 9.5, 9.5, 9.5, 9.5,
9.5, 9.5, 9.9, 9.5, 9.7, 9.1, 9.1, 8.9, 9.3, 9.1, 9.1, 9.3, 9.5, 9.3, 9.3,
9.3, 9.9, 9.7, 9.1, 9.3, 9.5, 9.4, 9.0, 9.0, 8.8, 9.0, 8.8, 8.6, 8.6, 8.0,
8.0, 8.0, 8.0, 8.6, 8.0, 8.0, 8.0, 7.6, 8.6, 9.6, 9.6, 10.0, 9.4, 9.3, 9.2,
9.5, 9.5, 9.5, 9.9, 9.9, 9.5, 9.3, 9.5, 9.5, 9.1, 9.3, 9.5, 9.3, 9.1, 9.3,
9.1, 9.5, 9.4, 9.5, 9.6, 10.2, 9.8, 9.6, 9.6, 9.4, 9.4, 9.4, 9.4, 9.6, 9.6,
9.4, 9.4, 9.0, 9.4, 9.4, 9.6, 9.4, 9.2, 8.8, 8.8, 9.2, 9.2, 9.6, 9.6, 9.8,
9.8, 10.0, 10.0, 9.4, 9.8, 8.8, 8.8, 8.8, 8.8, 9.6, 9.6, 9.6, 9.2, 9.2, 9.0,
9.0, 9.0, 9.4, 9.0, 9.0, 9.4, 9.4, 9.6, 9.4, 9.6, 9.6, 9.6, 10.0, 10.0, 9.6,
9.2, 9.2, 9.2, 9.0, 9.0, 9.6, 9.8, 10.2, 10.0, 10.0, 10.0, 9.4, 9.2, 9.6, 9.7,
9.7, 9.8, 9.8, 9.8, 10.0, 10.0, 8.6, 9.0, 9.4, 9.4, 9.4, 9.4, 9.4, 9.6, 10.0,
10.0, 9.8, 9.8, 9.7, 9.6, 9.4, 9.2, 9.0, 9.4, 9.6, 9.6, 9.6, 9.6, 9.6, 9.6,
9.0, 9.4, 9.4, 9.4, 9.6, 9.4, 9.6, 9.6, 9.8, 9.8, 9.8, 9.6, 9.2, 9.6, 9.2,
9.2, 9.6, 9.6, 9.6, 9.6, 9.6, 9.6, 10.0, 10.0, 10.4, 10.4, 9.8, 9.0, 9.6, 9.8,
9.6, 8.6, 8.0, 8.0, 8.0, 8.0, 8.4, 8.8, 8.4, 8.4, 9.0, 9.0, 9.4, 10.0, 10.0,
10.0, 10.2, 10.0, 10.0, 9.6, 9.0, 9.0, 8.6, 9.0, 9.6, 9.6, 9.0, 9.0, 8.9, 8.8,
8.7, 8.6, 8.3, 7.9])
times = array([0] * len(x))
for i in range(0, len(times)):
times[i] = i + 1
# Actual values of series D at time points t=305,...,t=310
actual = array([8.5, 8.7, 8.9, 9.1, 9.1, 9.1])
n_predict = 6
n_obs = len(x)
model = [3, 1, 1, 0]
parameters = autoArima(times, x,
model=model,
aic=aic,
critical=3.8,
method=3,
numOutliers=num_outliers,
outlierStatistics=outlier_stat,
residualSigma=res_sigma,
numPredict=n_predict,
outlierForecast=outlier_forecast)
print("Method 3: Specified ARIMA model")
print("\nModel chosen: p=%d, q=%d, s=%d, d=%d" %
(model[0], model[1], model[2], model[3]))
print("\nNumber of outliers: %d" % num_outliers[0])
print("\nOutlier statistics:")
print("\nTime point\tOutlier type")
for i in range(0, num_outliers[0]):
print("%d\t\t%d" % (outlier_stat[i, 0], outlier_stat[i, 1]))
print("\nAIC = %lf" % aic[0])
print("RSE = %lf" % res_sigma[0])
for i in range(0, n_predict):
forecast_table[i, 0] = actual[i]
forecast_table[i, 1] = outlier_forecast[i, 0]
forecast_table[i, 2] = outlier_forecast[i, 1]
forecast_table[i, 3] = outlier_forecast[i, 2]
print("\nParameters:")
for i in range(0, int(model[0] + model[1] + 1)):
print("parameters[%d]=%lf" % (i, parameters[i]))
writeMatrix("\t* * * Forecast Table * * *"
"\nOrig. series\t forecast\tprob. limits\tpsi weights",
forecast_table, writeFormat="%11.4f")
Output¶
Method 3: Specified ARIMA model
Model chosen: p=3, q=1, s=1, d=0
Number of outliers: 1
Outlier statistics:
Time point Outlier type
217 3
AIC = 675.886035
RSE = 0.286720
Parameters:
parameters[0]=1.892687
parameters[1]=0.184446
parameters[2]=0.641219
parameters[3]=-0.029179
parameters[4]=-0.742956
* * * Forecast Table * * *
Orig. series forecast prob. limits psi weights
1 2 3 4
1 8.5000 8.0471 0.5620 0.9274
2 8.7000 8.2004 0.7664 0.8123
3 8.9000 8.3347 0.8921 0.7153
4 9.1000 8.4534 0.9784 0.6257
5 9.1000 8.5570 1.0397 0.5504
6 9.1000 8.6483 1.0847 0.4819