timeSeriesClassFilter

Converts time series data sorted within nominal classes in decreasing chronological order to a useful format for processing by a neural network.

Synopsis

timeSeriesClassFilter (nLags, nClasses, iClass, x)

Required Arguments

int nLags (Input)
The number of lags. The number of lags must be one or greater.
int nClasses (Input)
The number of classes associated with these data. The number of classes must be one or greater.
int iClass[] (Input)
An array of length nPatterns. The i-th element in iClass is equal to the class associated with the i-th element of x. The classes must be numbered from 1 to nClasses.
float x[] (Input)
A sorted array of length nPatterns. This array is assumed to be sorted first by class designations and then descending by chronological order, i.e., most recent observations appear first within a class.

Return Value

An array of size nPatterns by nLags columns. If errors are encountered, then None is returned.

Optional Arguments

lags, int lag[] (Input)

An array of length nLags. The i-th element in lags is equal to the lag requested for the i-th column of z. Every lag must be non-negative.

Default: lags[i]=i

Description

The function timeSeriesClassFilter accepts a data array, x[], and returns a new data array, z[], containing nLags columns, each containing a lagged version of x.

The output data array, z, can be represented symbolically as:

\[\texttt{z} = |x(0) : x(1) : x(2) : … : x(\texttt{nLags-1})|,\]

where x(i) is the i-th lagged column of the incoming data array, x. Notice that nLags is the number of lags and not the maximum lag. The maximum number of lags is maxLag= nLags-1, unless the optional input lags[] is given, the highest lag is maxLags. If nLags =2 and the optional input lags[] is not given, then the output array contains the lags 0, 1.

Consider, an example in which nPatterns=10, nLags =2 and

\[x^T = \{1,2,3,4,5,6,7,8,9,10\}\]

If \(\mathit{lag}^T=\{ 0,2 \}\) and

\[\mathit{i\_class}^T = \{1,1,1,1,1,1,1,1,1,1\}\]

then, nClasses=1 and z would contain 2 columns and 10 rows:

\[\begin{split}z = \begin{bmatrix} 1 & 3 \\ 2 & 4 \\ 3 & 5 \\ 4 & 6 \\ 5 & 7 \\ 6 & 8 \\ 7 & 9 \\ 8 & 10 \\ 9 & \mathrm{NaN} \\ 10 & \mathrm{NaN} \\ \end{bmatrix}\end{split}\]

Note that since \(lag^T=\left[0,1\right]\), the first column of z is formed using a lag of zero and the second is formed using a lag of two. A zero lag corresponds to no lag, which is why the first column of z in this example is equal to the original data in x.

On the other hand, if the data were organized into two classes with

\[\mathit{i\_class}^T = \{1,1,1,1,1,2,2,2,2,2\}\]

then z is still a 2 by 10 matrix, but with the following values:

\[\begin{split}z = \left[ \begin{array}{cc} 1 & 3 \\ 2 & 4 \\ 3 & 5 \\ 4 & \mathrm{NaN} \\ 5 & \mathrm{NaN} \\ \hline 6 & 8 \\ 7 & 9 \\ 8 & 10 \\ 9 & \mathrm{NaN} \\ 10 & \mathrm{NaN} \\ \end{array} \right]\end{split}\]

The first 5 rows of z are the lagged columns for the first class, and the last five are the lagged columns for the second class.

Example

Suppose that the training data to the neural network is represented by the following data matrix consisting of a single nominal variable coded into two binary columns and a single time series variable:

\[\begin{split}\begin{bmatrix} 0 & 1 & 2.1 \\ 0 & 1 & 2.3 \\ 0 & 1 & 2.4 \\ 0 & 1 & 2.5 \\ 1 & 0 & 1.1 \\ 1 & 0 & 1.2 \\ 1 & 0 & 1.3 \\ 1 & 0 & 1.4 \\ \end{bmatrix}\end{split}\]

In this case, nPatterns=8 and nClasses=2. If we wanted to lag the \(3^{rd}\) column by 2 time lags, i.e., nLags=2,

\[\mathit{lag}^T = \{0,1\}\]
\[\mathit{i\_class}^T = \{1,1,1,1,2,2,2,2\}\]
\[x^T = \{2.1,2.3,2.4,2.5,1.1,1.2,1.3,1.4\}\]

The resulting data matrix would have 8 rows and 2 columns:

\[\begin{split}z = [ x(0)\phantom{..}x(1) ] = \left[ \begin{array}{cc} 2.1 & 2.3 \\ 2.3 & 2.4 \\ 2.4 & 2.5 \\ 2.5 & \mathrm{NaN} \\ \hline 1.1 & 1.2 \\ 1.2 & 1.3 \\ 1.3 & 1.4 \\ 1.4 & \mathrm{NaN} \\ \end{array} \right]\end{split}\]
from numpy import *
from pyimsl.stat.timeSeriesClassFilter import timeSeriesClassFilter
from pyimsl.stat.writeMatrix import writeMatrix

x = array([2.1, 2.3, 2.4, 2.5, 1.1, 1.2, 1.3, 1.4])
iClass = [1, 1, 1, 1, 2, 2, 2, 2]
z = timeSeriesClassFilter(2, 2, iClass, x)
writeMatrix("z", z)

Output

 
             z
             1            2
1          2.1          2.3
2          2.3          2.4
3          2.4          2.5
4          2.5  ...........
5          1.1          1.2
6          1.2          1.3
7          1.3          1.4
8          1.4  ...........