kalman

Performs Kalman filtering and evaluates the likelihood function for the state‑space model.

Synopsis

kalman (b, covb, n, ss, alndet)

Required Arguments

float b[] (Input/Output)
Array of length nb containing the estimated state vector. The input is the estimated state vector at time k given the observations through time k - 1. The output is the estimated state vector at time k + 1 given the observations through time k. On the first call to kalman, the input b must be the prior mean of the state vector at time 1.
float covb[[]] (Input/Output)
Array of size nb by nb such that covb\(\times\sigma ^2\) is the mean squared error matrix for b. Before the first call to kalman, covb\(\times\sigma ^2\) must equal the variance-covariance matrix of the state vector.
int n (Input/Output)
The rank of the variance-covariance matrix for all the observations. n must be initialized to zero before the first call to kalman. In the usual case when the variance-covariance matrix is nonsingular, n equals the sum of the ny’s from the invocations to kalman. See optional argument update below for the definition of ny.
float ss (Input/Output)
The generalized sum of squares. The estimate of \(\sigma^2\) is given by \(\frac{\mathtt{ss}}{\mathtt{n}}\). ss must be initialized to zero before the first call to kalman is made.
float alndet (Input/Output)
The natural log of the product of the nonzero eigenvalues of P where P × \(\sigma^2\) is the variance-covariance matrix of the observations. Although alndet is computed, kalman avoids the explicit computation of P. alndet must be initialized to zero before the first call to kalman. In the usual case when P is nonsingular, alndet is the natural log of the determinant of P.

Optional Arguments

update, float y, float z, float r (Input)

Perform computation of the update equations.

y: Array of length ny containing the observations.

z: ny by nb array containing the matrix relating the observations to the state vector in the observation equation.

r: ny by ny array containing the matrix such that r × \(\sigma^2\) is the variance-covariance matrix of errors in the observation equation.

\(\sigma^2\) is a positive unknown scalar. Only elements in the upper triangle of r are referenced.

t, float (Input)
nb by nb transition matrix in the state equation Default: t = identity matrix
q, float (Input)

nb by nb matrix such that q × \(\sigma^2\) is the variance-covariance matrix of the error vector in the state equation.

Default: There is no error term in the state equation.

tolerance, float (Input)

Tolerance used in determining linear dependence.

Default: tolerance = 100.0 × machine(4)

v (Output)
An array of length ny containing the one-step-ahead prediction error.
covv (Output)
A matrix such that covv × \(\sigma^2\) is the variance-covariance matrix of v.

Description

Function kalman is based on a recursive algorithm given by Kalman (1960), which has come to be known as the Kalman filter. The underlying model is known as the state‑space model. The model is specified stage by stage where the stages generally correspond to time points at which the observations become available. The function kalman avoids many of the computations and storage requirements that would be necessary if one were to process all the data at the end of each stage in order to estimate the state vector. This is accomplished by using previous computations and retaining in storage only those items essential for processing of future observations.

The notation used here follows that of Sallas and Harville (1981). Let \(y_k\) (input in y using optional argument update) be the \(n_k\) × 1 vector of observations that become available at time k. The subscript k is used here rather than t, which is more customary in time series, to emphasize that the model is expressed in stages \(k=1, 2,\ldots\) and that these stages need not correspond to equally spaced time points. In fact, they need not correspond to time points of any kind. The observation equation for the state‑space model is

\[y_k = Z_kb_k + e_k     k = 1, 2, …\]

Here, \(Z_k\) (input in z using optional argument update) is an \(n_k\times q\) known matrix and \(b_k\) is the q × 1 state vector. The state vector \(b_k\) is allowed to change with time in accordance with the state equation

\[b_{k+1} = T_{k+1}b_k + w_{ k+1}k = 1, 2, \ldots\]

starting with \(b_1=\mu_1+w_1\).

The change in the state vector from time k to k + 1 is explained in part by the transition matrix \(T _{k+1}\) (the identity matrix by default, or optionally input using t), which is assumed known. It is assumed that the q-dimensional \(w_k\)s (\(k=1,2,\ldots\)) are independently distributed multivariate normal with mean vector 0 and variance-covariance matrix \(\sigma^2 Q_k\), that the \(n_k\)-dimensional \(e_k\)s (\(k=1,2,\ldots\)) are independently distributed multivariate normal with mean vector 0 and variance-covariance matrix \(\sigma^2 R_k\), and that the \(w_k\)s and \(e_k\)s are independent of each other. Here, \(\mu_1\) is the mean of \(b_1\) and is assumed known, \(\sigma^2\) is an unknown positive scalar. \(Q_{k+1}\)(input in q) and \(R_k\) (input in r) are assumed known.

Denote the estimator of the realization of the state vector \(b_k\) given the observations \(y_1,y_2,\ldots,y_j\) by

\[\hat{\beta}_{k|j}\]

By definition, the mean squared error matrix for

\[\hat{\beta}_{k|j}\]

is

\[\sigma^2 C_{k|j} = E\left(\hat{\beta}_{k|j} - b_k\right) \left(\hat{\beta}_{k|j} - b_k\right)^T\]

At the time of the k-th invocation, we have

\[\hat{\beta}_{k|k-1}\]

and \(C_{k|k-1}\) which were computed from the (k-1)-st invocation, input in b and covb, respectively. During the k-th invocation, function kalman computes the filtered estimate

\[\hat{\beta}_{k|k}\]

along with \(C_{k|k}\). These quantities are given by the update equations:

\[\begin{split}\begin{array}{l} \hat{\beta}_{k|k} = \hat{\beta}_{k|k-1} = C_{k|k-1} Z_k^T K_k^{-1} v_k \\ C_{k|k} = C_{k|k-1} - C_{k|k-1} Z_k^T H_k^{-1} Z_k C_{k|k-1} \\ \end{array}\end{split}\]

where

\[v_k = y_k - Z_k \hat{\beta}_{k|k-1}\]

and where

\[H_k = R_k + Z_k C_{k|k-1} Z_k^T\]

Here, \(v_k\) (stored in v) is the one-step-ahead prediction error, and \(\sigma^2\) \(H_k\) is the variance-covariance matrix for \(v_k\). \(H_k\) is stored in covv. The “start-up values” needed on the first invocation of kalman are

\[\hat{\beta}_{1|0} = \mu_1\]

and \(C _{1|0}=Q_1\) input via b and covb, respectively. Computations for the k-th invocation are completed by kalman computing the one-step-ahead estimate

\[\hat{\beta}_{k+1|k}\]

along with \(C_{k+1|k}\) given by the prediction equations:

\[\begin{split}\begin{array}{l} \hat{\beta}_{k+1|k} = T_{k+1} \hat{\beta}_{k|k} \\ C_{k+1|k} = T_{k+1} C_{k|k} T_{k+1}^T + Q_{k+1} \\ \end{array}\end{split}\]

If both the filtered estimates and one-step-ahead estimates are needed by the user at each time point, kalman can be invoked twice for each time point—first without t and q to produce

\[\hat{\beta}_{k|k}\]

and \(C_{k|k}\), and second without update to produce

\[\hat{\beta}_{k+1|k}\]

and \(C_{k+1|k}\). (Without t and q, the prediction equations are skipped. Without update, the update equations are skipped.)

Often, one desires the estimate of the state vector more than one-step-ahead, i.e., an estimate of

\[\hat{\beta}_{k|j}\]

is needed where k > j + 1. At time j, kalman is invoked with update to compute

\[\hat{\beta}_{j+1|j}\]

Subsequent invocations of kalman without update can compute

\[\hat{\beta}_{j+2|j}, \hat{\beta}_{j+3|j}, \ldots \hat{\beta}_{k|j}\]

Computations for

\[\hat{\beta}_{k|j}\]

and \(C_{k|j}\) assume the variance-covariance matrices of the errors in the observation equation and state equation are known up to an unknown positive scalar multiplier, \(\sigma^2\). The maximum likelihood estimate of \(\sigma^2\) based on the observations \(y_1,y_2,\ldots,y_m\), is given by

\[\hat{\sigma}^2 = SS/N\]

where

\[N = \sum_{k=1}^{m} n_k \text{ and } SS = \sum_{k=1}^{m} v_k^T H_k^{-1} v_k\]

N and SS are the input/output arguments n and ss.

If \(\sigma^2\) is known, the \(R_k\)s and \(Q_k\)s can be input as the variance-covariance matrices exactly. The earlier discussion is then simplified by letting \(\sigma^2=1\).

In practice, the matrices \(T_k\), \(Q_k\), and \(R_k\) are generally not completely known. They may be known functions of an unknown parameter vector θ. In this case, kalman can be used in conjunction with an optimization program (see function minUnconMultivar, PyIMSL Math Library, Chapter 8, Optimization) to obtain a maximum likelihood estimate of θ. The natural logarithm of the likelihood function for \(y_1,y_2, \ldots,y_m\) differs by no more than an additive constant from

\[\begin{split}\begin{array}{l} L\left(\theta, \sigma^2; y_1, y_2, \ldots, y_m\right) = - \tfrac{1}{2}N\ln \sigma^2 \\ \phantom{...} -\tfrac{1}{2}\displaystyle\sum_{k=1}^{m} \ln \left[\det \left(H_k\right) \right] - \tfrac{1}{2}\sigma^{-2} \sum_{k=1}^{m} v_k^T H_k^{-1} v_k \end{array}\end{split}\]

(Harvey 1981, page 14, equation 2.21).

Here,

\[\sum_{k=1}^{m} \ln \left[\det \left(H_k\right)\right]\]

(stored in alndet) is the natural logarithm of the determinant of V where \(\sigma^2 V\) is the variance-covariance matrix of the observations.

Minimization of \(-2L(\theta,\sigma^2 ; y_1,y_2,\ldots,y_m)\) over all θ and \(\sigma^2\) produces maximum likelihood estimates. Equivalently, minimization of \(-2L_c(\theta; y_1,y_2,\ldots,y_m)\) where

\[L-c \left(\theta; y_1, y_2, \ldots, y_m\right) = -\frac{1}{2} N \ln \left(\frac{SS}{N}\right) - \frac{1}{2} \sum_{k=1}^{m} \ln\left[\det \left(H_k\right)\right]\]

produces maximum likelihood estimates

\[\hat{\theta} \text{ and } \hat{\sigma}^2 = SS/N\]

The minimization of \(-2L_c(\theta; y_1,y_2,\ldots,y_m)\) instead of \(-2L(\theta,\sigma^2 ; y_1,y_2,\ldots,y_m)\), reduces the dimension of the minimization problem by one. The two optimization problems are equivalent since

\[\hat{\sigma}^2(\theta) = SS(\theta)/N\]

minimizes \(-2L(\theta,\sigma^2; y_1,y_2,\ldots,y_m)\) for all θ, consequently,

\[\hat{\sigma}^2(\theta)\]

can be substituted for \(\sigma^2\) in \(L(\theta,\sigma^2 ; y_1,y_2,\ldots, y_m)\) to give a function that differs by no more than an additive constant from \(L_c(\theta; y_1,y_2,\ldots,y_m)\).

The earlier discussion assumed \(H_k\) to be nonsingular. If \(H_k\) is singular, a modification for singular distributions described by Rao (1973, pages 527–528) is used. The necessary changes in the preceding discussion are as follows:

  • Replace \(H_k^{-1}\) by a generalized inverse.
  • Replace \(\det(H_k)\) by the product of the nonzero eigenvalues of \(H_k\).
  • Replace N by \(\textstyle\sum_{k=1}^{m} \mathrm{rank}\left(H_k\right)\)

Maximum likelihood estimation of parameters in the Kalman filter is discussed by Sallas and Harville (1988) and Harvey (1981, pages 111–113).

Example 1

Function kalman is used to compute the filtered estimates and one-step-ahead estimates for a scalar problem discussed by Harvey (1981, pages 116‑117). The observation equation and state equation are given by

\[\begin{split}\begin{array}{l} y_k = b_k + e_k \\ b_{k+1} = b_k + w_{k+1} \phantom{...} k=1,2,3,4 \\ \end{array}\end{split}\]

where the \(e_k\)s are identically and independently distributed normal with mean 0 and variance \(\sigma^2\), the \(w_k\)s are identically and independently distributed normal with mean 0 and variance \(4\sigma^2\), and \(b_1\)is distributed normal with mean 4 and variance \(16\sigma^2\). Two invocations of kalman are needed for each time point in order to compute the filtered estimate and the one-step-ahead estimate. The first invocation does not use the optional arguments t and q so that the prediction equations are skipped in the computations. The update equations are skipped in the computations in the second invocation.

This example also computes the one-step-ahead prediction errors. Harvey (1981, page 117) contains a misprint for the value \(v_4\) that he gives as 1.197. The correct value of \(v_4=1.003\) is computed by kalman.

from __future__ import print_function
from numpy import *
from pyimsl.stat.kalman import kalman

nb = 1
nobs = 4
ny = 1
ydata = (4.4, 4.0, 3.5, 4.6)
z = empty((nb, nb))
r = empty((ny, ny))
q = empty((nb, nb))
t = empty((nb, nb))
b = empty(nb)
y = empty(ny)
covb = empty((nb, nb))
covv = empty(0)
v = []

z[0, 0] = 1.0
r[0, 0] = 1.0
q[0, 0] = 4.0
t[0, 0] = 1.0
b[0] = 4.0
covb[0, 0] = 16.0

# Initialize arguments for initial call to imsls_f_kalman.
n = [0]
ss = [0.0]
alndet = [0.0]
update = {'y': y, 'z': z, 'r': r}
print("k/j      b       covb n     ss      alndet     v       covv")

for i in range(0, nobs):
    # Update
    y[0] = ydata[i]

    kalman(b, covb, n, ss, alndet,
           update=update,
           v=v,
           covv=covv)

    print("%d/%d %8.3f %8.3f %d %8.3f %8.3f %8.3f %8.3f" %
          (i, i, b[0], covb[0, 0], n[0], ss[0], alndet[0], v[0], covv[0, 0]))

    # Prediction
    kalman(b, covb, n, ss, alndet,
           t=t,
           q=q)

    print("%d/%d %8.3f %8.3f %d %8.3f %8.3f %8.3f %8.3f" %
          (i + 1, i, b[0], covb[0, 0], n[0], ss[0], alndet[0], v[0], covv[0, 0]))

Output

k/j      b       covb n     ss      alndet     v       covv
0/0    4.376    0.941 1    0.009    2.833    0.400   17.000
1/0    4.376    4.941 1    0.009    2.833    0.400   17.000
1/1    4.063    0.832 2    0.033    4.615   -0.376    5.941
2/1    4.063    4.832 2    0.033    4.615   -0.376    5.941
2/2    3.597    0.829 3    0.088    6.378   -0.563    5.832
3/2    3.597    4.829 3    0.088    6.378   -0.563    5.832
3/3    4.428    0.828 4    0.260    8.141    1.003    5.829
4/3    4.428    4.828 4    0.260    8.141    1.003    5.829

Example 2

Function kalman is used with function minUnconMultivar, (see PyIMSL Math Library, Chapter 8, Optimization) to find a maximum likelihood estimate of the parameter θ in a MA(1) time series represented by \(y_k =\varepsilon_k-\theta\varepsilon _{k-1}\). Function randomArma (see Chapter 12, Random Number Generation) is used to generate 200 random observations from an MA(1) time series with θ = 0.5 and \(\sigma^2=1\).

The MA(1) time series is cast as a state‑space model of the following form (see Harvey 1981, pages 103–104, 112):

\[\begin{split}\begin{array}{l} y_k = \begin{pmatrix} 1 & 0 \end{pmatrix} b_k \\ \\ b_k = \begin{pmatrix} 0 & 1 \\ 0 & 0 \end{pmatrix} b_{k-1} + w_k \\ \end{array}\end{split}\]

where the two-dimensional \(w_k\)s are independently distributed bivariate normal with mean 0 and variance \(\sigma^2 Q_k\) and

\[\begin{split}\begin{array}{l} Q_1 = \begin{pmatrix} 1+\theta^2 & -\theta \\ -\theta & \theta^2 \\ \end{pmatrix} \\ \\ Q_k = \begin{pmatrix} 1 & -\theta \\ -\theta & \theta^2 \\ \end{pmatrix} \phantom{.....} k=2,3, \ldots 200 \end{array}\end{split}\]

The warning error that is printed as part of the output is not serious and indicates that minUnconMultivar (see Math Library, Chapter 8, Optimization) is generally used for multi-parameter minimization.

from __future__ import print_function
import threading
from numpy import *
from pyimsl.stat.kalman import kalman
from pyimsl.stat.randomArma import randomArma
from pyimsl.stat.randomSeedSet import randomSeedSet
from pyimsl.math.minUnconMultivar import minUnconMultivar

nb = 2
ny = 1
nobs = 200
localdata = threading.local()


def fcn(ntheta, theta):
    t = array([[0.0, 1.0], [0.0, 0.0]])
    z = array([[1.0, 0.0]])
    q = empty((nb, nb), dtype=double)
    covb = empty((nb, nb), dtype=double)
    r = empty((ny, ny), dtype=double)
    y = empty((ny), dtype=double)
    b = empty((nb), dtype=double)

    if (abs(theta[0]) > 1.0):
        res = 1.0e10
    else:
        q[0][0] = 1.0
        q[0][1] = -theta[0]
        q[1][0] = -theta[0]
        q[1][1] = theta[0] * theta[0]
        r[0][0] = 0.0
        b[0] = 0.0
        b[1] = 0.0
        covb[0][0] = 1.0 + theta[0] * theta[0]
        covb[0][1] = -theta[0]
        covb[1][0] = -theta[0]
        covb[1][1] = theta[0] * theta[0]

        n = [0]
        ss = [0.0]
        alndet = [0.0]
        ydata = localdata.ydata
        update = {'y': y, 'z': z, 'r': r}
        for i in range(0, nobs):
            y[0] = ydata[i]
            update["y"] = y

            kalman(b, covb, n, ss, alndet,
                   update=update,
                   q=q,
                   t=t)

        res = n[0] * log(ss[0] / n[0]) + alndet[0]
    return res


ntheta = 200
randomSeedSet(123457)
pma = [0.5]
lagma = [1]
ydata = randomArma(ntheta, None, pma, acceptRejectMethod=True,
                   nonzeroMalags=lagma)
localdata.ydata = ydata
theta = minUnconMultivar(fcn, 1)
print("* * * Final Estimate for THETA * * *\n")
print("Maximum likelihood estimate, THETA = %f\n" % theta[0])

Output

***
*** Warning (immediate) error issued from IMSL function u5inf :
*** This routine may be inefficient for a problem of size "n" = 1.
***
* * * Final Estimate for THETA * * *

Maximum likelihood estimate, THETA = 0.452940