autoParm

Estimates structural breaks in non-stationary univariate time series.

Synopsis

autoParm (y, npcs)

Required Arguments

float y[] (Input)
An array of length nObs containing the time series.
int npcs (Input/Output)
The number of requested/estimated pieces or segments of the time series. npcs is considered input only when arModel is provided.

Return Value

An array (arModel) of length npcs × 2 containing the break points and AR orders for the derived model. If arModel is used, the return value is None.

Column Index Description
0 Structural break points \(\tau_j\)
1 AR order ( \(p_j\) ) for each segment

Optional Arguments

maxArOrder, int (Input)

Maximum order to consider for each AR model.

Default: maxArOrder = 20.

method, int (Input)

Method of estimation.

method Method Used
0 Yule –Walker
1 Least Squares
2 Burg

Default: method = 0.

modelSelectionCriterion, int (Input)

Selection criterion.

modelSelectionCriterion Criterion Used
0 Minimum Description Length (MDL)
1 Akaike’s Information Criterion (AIC)

Default: modelSelectionCriterion = 0.

maximumLikelihood, int (Input)

Likelihood computation method.

maximumLikelihood Computation Method
0 Exact
1 Approximate

Default: maximumLikelihood = 0.

arModel, int[] (Input)
A user specified array of length npcs × 2 containing the break points and AR orders. When this argument is used, only the AR parameters and quality of the fit are determined.
Column Index Description
0 Structural break points \(\tau_j\)
1 AR order ( \(p_j\) ) for each segment
t_print, int (Input)

Printing option.

t_print Action
0 No printing
1 Prints final results only
2 Prints intermediate and final results

Default: t_print = 0.

randomSeed, int (Input)

Seed of the random number generator. For the same data and parameter settings, autoParm will return the same results each time if a seed is set. If randomSeed = 0, the system clock will be used to generate a seed. The result will be nondeterministic.

Default: randomSeed = 0.

Note: The following input arguments are for setting up and running the embedded Genetic Algorithm. In most situations, the default values should be used for these arguments. Users may wish to change some or all for testing or research purposes.
probDistribution, float[] (Input)

Array of length maxArOrder + 1 giving the probability distribution over the AR order variable p = 0,…, maxArOrder. i = 0,…, maxArOrder is used to randomly assign an AR order to breakpoint position for a given chromosome. probDistribution[i] > = 0 and if ΣprobDistribution is not equal to 1, the values will be normalized, i.e., probDistribution[i] = probDistribution[i]/ ΣprobDistribution.

Default: probDistribution[i] = 1/(maxArOrder + 1) for all i.

minObservations, int[] (Input)

Array of length maxArOrder + 1 containing minimum number of observations required for valid estimates of AR model with order p = 0, …, maxArOrder.

Default: minObservations [p] = 2 ×(number of parameters) + 2 = 2 × (p + 2) + 2.

gaParameters, float[] (Input)
Array of length 4 containing parameters that control the behavior of the genetic algorithm. These values should be strictly greater than zero and less than one to avoid unexpected results.
Index Behavior
0 Probability used to set initial break points in a chromosome. Default: min (minObservations) / nObs.
1

Probability of crossover used to decide between a crossover and a mutation.

Default: 1 – min (minObservations) / nObs.

2

In the mutation operation, probability an AR(p) model is enforced at the current position.

Default: 0.4.

3

In the mutation operation, probability a break point is disallowed at the current position.

Default: 0.3.

Note: gaParameters[2] and gaParameters[3] must be valid probabilities and their sum must be between 0 and 1. 1 – gaParameters[2]gaParameters[3] is the probability that the chromosome j inherits the parent’s chromosome j.
island, int[] (Input)
Array of length 5 containing the migration policy parameters.
Index Policy
0

Number of islands.

Default: 40.

1

Number of generations that pass before migration occurs. Note that the convergence of the algorithm is revised whether migrations take place or not (see argument island[4]).

Default: 5.

2

Number of subjects that migrate at each migration event.

Default: 2.

3

Population size (number of chromosomes) per island.

Default: 40.

4

Migration flag. If 1, migration is performed.

Default: 1.

maxMigrations, int (Input)

Maximum number of times that migrations may take place before the function is stopped if convergence has not occurred.

Default: maxMigrations = 20.

stopIterations, int (Input)

Number of iterations. The function will declare convergence and stop the iterations if the criterion value (MDL/AIC) has not changed after stopIterations consecutive migrations. Otherwise, the algorithm will declare non-convergence and stop after maxMigrations migrations have taken place. See also maxMigrations and island[1]. Note that logically, stopIterations < maxMigrations.

Default: stopIterations = 10.

selectionCriterionValue, (Output)
Final value of the selection criterion.
arFit (Output)

An array of length npcs × maxArOrder containing the AR coefficient estimates φ for each segment. arFit[imaxArOrder+j] is the j‑th coefficient for segment i where i = 0, …, npcs - 1 and j = 0, …, arModel[i×2 + 1].

Note that the intercept is not reported.

arStats (Output)
An array of length npcs × 2.
Column Index Description
0 Likelihood values for each of the fitted AR models
1 Residual variances for each of the fitted AR models

Description

Function autoParm estimates the structural breaks of a non-stationary time series using, with permission from the authors, the method developed by Davis et al (2006). autoParm estimates a partition of the time index and models the time series in each segment as a separate auto-regressive (AR(p)) process. The function returns the estimated breakpoints, the estimated AR(p) models, and supporting statistics.

For the observed time series \(Y_t,t=1,\ldots n\) the problem is to find m, the number of breaks, their locations, \(1<\tau_1<\tau_2<\cdots<\tau_m<n\), and \(p_j\), \(j=1,\ldots,m+1\), the order of the AR process in which the j‑th segment is modeled. That is, \(Y_t=X_{t,j}\) for \(\tau_{j-1}\leq t<\tau_j\) (for convenience, \(\tau_0 :=1\) and \(\tau_{m+1} :=n+1\)), where \(\{X_{t,j}\}\) is an AR process of order \(p_j\)

\[X_{t,j} = \phi_{j,1} X_{t-1,j} + \cdots + \phi_{j,p_j} X_{t-j,p_j} + \sigma_t \varepsilon_t\]

and \(\varepsilon_t\), the noise sequence, is independent and identically distributed with mean 0 and variance 1. Note that a series with m breaks will have m + 1 segments (m + 1 = npcs).

The vector \(\left\{ m,\tau_1,\ldots,\tau_m,p_1,\ldots,p_{m+1} \right\}\) completely specifies a piecewise AR model. To estimate this vector autoParm optimizes, with respect to this vector, one of two selection criteria: the first is a Minimum Description Length (MDL) criterion, and the second is the Akaike’s Information Criterion (AIC). The MDL is defined as

\[\begin{split}\begin{array}{l} \mathit{MDL}\left(m, \tau_1, \ldots, \tau_m, p_1, \ldots, p_{m+1}\right) \\ \phantom{.....} = \ln m+(m-1) \ln n + \sum\limits_{j=1}^{m+1} \frac{2 + p_j}{2} \ln n_j \ln L \end{array}\end{split}\]

while the AIC criterion is given by

\[\begin{split}\begin{aligned} \mathit{AIC}\left( m,\tau_1, \ldots \tau_m, p+1, \ldots p_{m+1} \right) &= 2 (\textit{number of parameters}) - 2 \ln L \\ &= 2 \left( 1 + m + \sum_{j=1}^{m+1} \left( 2 + p_j \right) \right) - 2 \ln L \end{aligned}\end{split}\]

where, given a candidate value of the vector \(\left\{ m,\tau_1,\ldots,\tau_m,p_1,\ldots,p_{m+1} \right\}\), \(L\) is the likelihood of the fitted piecewise AR model evaluated at the parameter estimates,

\[\left\{\hat{\gamma}_1, \hat{\phi}_{1,1}, \ldots \hat{\phi}_{1,p_1}, \hat{\sigma}_1^2, \ldots \hat{\gamma}_{m+1}, \hat{\phi}_{m+1,1}, \ldots \hat{\phi}_{m+1,p_{m+1}}, \hat{\sigma}_{m+1}^{2}\right\}.\]

The parameters \(\left\{ \gamma_j,\phi_{1,1},\ldots\phi _{1,p_j},\sigma_j^2 \right\}\) of the j‑th AR segment are estimated by the choice of one of three estimation methods: Yule‑Walker, Burg, or Least Squares.

For simplicity, assume the mean of each series \(\{X_{t,j}\}\) is 0 and that the errors are Gaussian. Then, the piecewise AR model has Gaussian likelihood

\[L = \prod_{j=1}^{m+1} (2\pi)^{-n_j/2} \left|V_j\right|^{-1/2} \exp\left\{-Y_j^T V_j Y_j\right\}\]

where \(V_j\) is the variance-covariance of the j‑th AR segment (of order \(p_j\) ) and \(Y_j\) is the vector of observations of the j‑th segment, i.e., \(Y_j=\left( *Y_{\tau_j},Y_{\tau_j+1},\ldots,Y_{\tau_{j+1}-1} \right)^T\).

To find the minimizer \(\left\{ m,\tau_j,p_j \right\}^*\) of either MDL or AIC, autoParm employs a Genetic Algorithm with islands, migration, cross-over and mutations. See Davis et.al. (2006) for further details.

Remarks

Function autoParm approximates locally stationary time series by independent auto‑regressive processes. Experimental results suggest that autoParm gives reasonable estimates of the structural breaks of a given time series, even if the segment series are not autoregressive. Also, based on experimental results, MDL gives better results than AIC as a selection criterion.

If randomSeed is set out of range, an informational (error) message is issued indicating that the seed will be reset to 123457. Also if maxMigrations migrations are reached in the genetic algorithm before the selection criterion value converges an informational message is issued suggesting the increase of maxMigrations.

Example

The examples below illustrate different scenarios using autoParm. The example series used in each case is the airline demand data (Box, Jenkins and Reinsel, 1994), which gives monthly total demand for the period January 1949 through December 1960. Each scenario sets the optional argument, randomSeed = 123457.

from __future__ import print_function
from numpy import *
from pyimsl.stat.autoParm import autoParm
from pyimsl.stat.dataSets import dataSets
from pyimsl.stat.difference import difference
from pyimsl.stat.writeMatrix import writeMatrix
from pyimsl.stat.free import free

n = 144
npcs = [0]
iseed = 123457
arp2 = [[0, 2], [59, 1]]
sc = []
arfit = []
arstat = []
# get data
y = dataSets(4)
y = y.reshape(n)
# Example 1: Use default values
print("Example 1: Use defaults")
arp1 = autoParm(y, npcs,
                t_print=1,
                randomSeed=iseed,
                selectionCriterionValue=sc,
                arFit=arfit,
                arStats=arstat)
free(arp1)

# Example 2: differenced series set period for the difference.
# iper is in years for this data set
print("=" * 50)
print("Example 2: differenced series")
iper = [1]
# set the order for the difference.
iord = [1]
# get differenced series dx
nlost = []
sc2 = []
arfit2 = []
arstat2 = []
dx = difference(y, iper,
                orders=iord,
                lost=nlost)

arp_2 = autoParm(dx[nlost[0]:], npcs,
                 t_print=1,
                 randomSeed=iseed,
                 selectionCriterionValue=sc2,
                 arFit=arfit2,
                 arStats=arstat2)

free(arp_2)

# Example 3: original series, lower order allowed
# lower maximum AR order
print("=" * 50)
print("Example 3: original series, lower order allowed")
sc3 = []
arfit3 = []
arstat3 = []
maxarorder = 5
arp3 = autoParm(y, npcs,
                maxArOrder=maxarorder,
                t_print=1,
                randomSeed=iseed,
                selectionCriterionValue=sc3,
                arFit=arfit3,
                arStats=arstat3)

free(arp3)

# Example 4: differenced series, lower order allowed
print("=" * 50)
print("Example 4: differenced series, lower order allowed")
sc4 = []
arfit4 = []
arstat4 = []
maxarorder = 5
arp4 = autoParm(dx[nlost[0]:], npcs,
                maxArOrder=maxarorder,
                t_print=1,
                randomSeed=iseed,
                selectionCriterionValue=sc4,
                arFit=arfit4,
                arStats=arstat4)

free(arp4)

# Example 5: original series, force fit the segments
# Fit only at the break points
print("=" * 50)
print("Example 5: original series, force fit the segments")
sc5 = []
arfit5 = []
arstat5 = []
npcs = [2]
arpnull = autoParm(y, npcs,
                   arModel=arp2,
                   t_print=1,
                   randomSeed=iseed,
                   selectionCriterionValue=sc5,
                   arFit=arfit5,
                   arStats=arstat5)

Output

Example 1: Use defaults
==================================================
Example 2: differenced series
==================================================
Example 3: original series, lower order allowed
==================================================
Example 4: differenced series, lower order allowed
==================================================
Example 5: original series, force fit the segments
   ============== final results ===============
number of pieces:          2 

selection criteria value: 685.899716

total time: 0.630000     conv:           1 

==================== final model estimates =====================
break point   order     est. coeff.    likelihood     resid. var
  arp[ 0]    arp[ 1]    arfit[ 0- 1]   arstat[ 0]     arstat[ 1]
    0           2        0.90785
                        -0.17078
                                        188.602        344.671
  arp[ 2]    arp[ 3]    arfit[20-32]   arstat[ 2]     arstat[ 3]
   43          13        1.03700
                        -0.07801
                        -0.03891
                        -0.03452
                         0.11961
                        -0.12852
                         0.01990
                        -0.04886
                         0.08090
                        -0.13118
                         0.22122
                         0.53863
                        -0.61515
                                        486.665        691.475
   ============== final results ===============
number of pieces:          1 

selection criteria value: 624.283501

total time: 0.600000     conv:           1 

==================== final model estimates =====================
break point   order     est. coeff.    likelihood     resid. var
  arp[ 0]    arp[ 1]    arfit[ 0-11]   arstat[ 0]     arstat[ 1]
    0          12       -0.02842
                        -0.22436
                        -0.16846
                        -0.24267
                        -0.10573
                        -0.22429
                        -0.12126
                        -0.26446
                        -0.07087
                        -0.24327
                        -0.07136
                         0.57129
                                        619.321        297.352
   ============== final results ===============
number of pieces:          2 

selection criteria value: 705.296593

total time: 0.540000     conv:           1 

==================== final model estimates =====================
break point   order     est. coeff.    likelihood     resid. var
  arp[ 0]    arp[ 1]    arfit[ 0- 0]   arstat[ 0]     arstat[ 1]
    0           1        0.89533
                                        270.393        333.564
  arp[ 2]    arp[ 3]    arfit[ 5- 6]   arstat[ 2]     arstat[ 3]
   62           2        1.19788
                        -0.35922
                                        424.270       1632.339
   ============== final results ===============
number of pieces:          2 

selection criteria value: 698.359495

total time: 0.530000     conv:           1 

==================== final model estimates =====================
break point   order     est. coeff.    likelihood     resid. var
  arp[ 0]    arp[ 1]    arfit[ 0- 0]   arstat[ 0]     arstat[ 1]
    0           0        -------
                                        335.565        357.389
  arp[ 2]    arp[ 3]    arfit[ 5- 5]   arstat[ 2]     arstat[ 3]
   76           1        0.33310
                                        352.175       1786.345
   ============== final results ===============
number of pieces: 2
selection criteria value:   712.521
==================== final model estimates =====================
break point   order     est. coeff.    likelihood     resid. var
  arp[ 0]    arp[ 1]    arfit[ 0- 1]   arstat[ 0]     arstat[ 1]
    0           2        1.12156
                        -0.24876
                                        258.192        313.889
  arp[ 2]    arp[ 3]    arfit[20-20]   arstat[ 2]     arstat[ 3]
   59           1        0.88605
                                        443.696       1937.633

Warning Errors

IMSLS_MAX_MIGRATIONS_EXCEEDED maxMigrations” migrations or “stopIterations” iterations were reached in the genetic algorithm before the selection criterion value converged. Try increasing “maxMigrations”, “stopIterations”.