pooledCovariances

Compute a pooled variance-covariance from the observations.

Synopsis

pooledCovariances (nRows, nVariables, x, nGroups)

Required Argument

int nRows (Input)
Number of rows (observations) in the input matrix x.
int nVariables (Input)
Number of variables to be used in computing the covariance matrix.
float x (Input)
A nRows × nVariables + 1 matrix containing the data. The first nVariables columns correspond to the variables, and the last column (column nVariables) must contain the group numbers.
int nGroups (Input)
Number of groups in the data.

Return Value

Matrix of size nVariables by nVariables containing the matrix of covariances.

Optional Arguments

xIndices, int igrp, int ind[], int ifrq, int iwt (Input)

Each of the four arguments contains indices indicating column numbers of x in which particular types of data are stored.

Parameter igrp contains the index for the column of x in which the group numbers are stored.

Parameter ind contains the indices of the variables to be used in the analysis.

Parameters ifrq and iwt contain the column numbers of x in which the frequencies and weights, respectively, are stored. Set ifrq = −1 if there will be no column for frequencies. Set iwt = −1 if there will be no column for weights. Weights are rounded to the nearest integer. Negative weights are not allowed.

Defaults: igrp = nVariables, ind[ ] = 0, 1, …, nVariables − 1, ifrq = −1, and iwt = −1

ido, int (Input)

Processing option.

ido Action
0 This is the only invocation; all the data are input at once. (Default)
1 This is the first invocation with this data; additional calls will be made. Initialization and updating for the nRows observations of x will be performed.
2 This is an intermediate invocation; updating for the nRows observations of x will be performed.
3 All statistics are updated for the nRows observations. The covariance matrix computed.

Default: ido = 0

rowsAdd (Input)

or

rowsDelete (Input)
By default (or if rowsAdd is specified), the observations in x are added into the analysis. If rowsDelete is specified, the observations are deleted from the analysis. If ido = 0, these optional arguments are ignored (data is always added if there is only one invocation).
groupCounts (Output)
An integer array of length nGroups containing the number of observations in each group. Array groupCounts is updated when ido is equal to 0, 1, or 2.
sumWeights (Output)
An array of length nGroups containing the sum of the weights times the frequencies in the groups.
means (Output)
An array of size nGroups × nVariables. The i-th row of means contains the group i variable means.
u (Output)
An array of size nVariables × nVariables containing the lower matrix U, the lower triangular for the pooled sample cross-products matrix. U is computed from the pooled sample covariance matrix, S (See the Description section below), as \(S=U^T U\).
nRowsMissing (Output)
Number of rows of data encountered in calls to pooledCovariances containing missing values (NaN) for any of the variables used.

Description

Function pooledCovariances computes the pooled variance-covariance matrix from a matrix of observations. The within-groups means are also computed. Listwise deletion of missing values is assumed so that all observations used are complete; in any row of x, if any element of the observation is missing, the row is not used. Function pooledCovariances should be used whenever the user suspects that the data has been sampled from populations with different means but identical variance-covariance matrices. If these assumptions cannot be made, a different variance-covariance matrix should be estimated within each group.

By default, all observations are processed in one call to pooledCovariances. The computations are the same as if pooledCovariances were consecutively called with ido equal to 1, 2, and 3. For brevity, the following discusses the computations with ido > 0.

When ido = 1 variables are initialized, workspace is allocated and input variables are checked for errors.

If nRows ≠ 0 (for any value of ido), the group observation totals, \(T_i\), for \(i=1,\ldots,g\), where g is the number of groups, are updated for the nRows observations in x. The group totals are computed as:

\[T_i = \sum_{j} w_{ij} f_{ij} x_{ij}\]

where \(w_{ij}\) is the observation weight, \(x_{ij}\) is the j-th observation in the i-th group, and \(f_{ij}\) is the observation frequency.

Modified Givens rotations are used in computed the Cholesky decomposition of the pooled sums of squares and crossproducts matrix. (Golub and Van Loan 1983).

The group means and the pooled sample covariance matrix S are computed from the intermediate results when ido = 3. These quantities are defined by

\[\overline{x}_{i\cdot} = \frac{T_i}{\sum\limits_j w_i f_i}\]
\[S = \frac{1}{\sum\limits_{ij} f_{ij} - g} \sum_{i,j} w_{ij} f_{ij} \left(x_{ij} - \overline{x}_{i\cdot}\right) \left(x_{ij} - \overline{x}_{i\cdot}\right)^T\]

Examples

Example 1

The following example computes a pooled variance-covariance matrix. The last column of the data set is the group indicator.

from numpy import *
from pyimsl.stat.pooledCovariances import pooledCovariances
from pyimsl.stat.writeMatrix import writeMatrix

nvar = 2
nobs = 6
n_groups = 2
x = [[2.2, 5.6, 1],
     [3.4, 2.3, 1],
     [1.2, 7.8, 1],
     [3.2, 2.1, 2],
     [4.1, 1.6, 2],
     [3.7, 2.2, 2]]

cov = pooledCovariances(nobs, nvar, x, n_groups)

writeMatrix("Pooled Covariance Matrix", cov)

Output

 
 Pooled Covariance Matrix
             1            2
1        0.708       -1.575
2       -1.575        3.883

Example 2

The following example computes a pooled variance-covariance matrix for the Fisher iris data. To illustrate the use of the ido argument, multiple calls to pooledCovariances are made.

The first column of data is the group indicator, requiring either a permutation of the matrix or the use of the xIndices optional keyword. This example chooses the keyword method.

from numpy import *
from pyimsl.stat.dataSets import dataSets
from pyimsl.stat.pooledCovariances import pooledCovariances
from pyimsl.stat.writeMatrix import writeMatrix

nobs = 150
nvar = 4
n_groups = 3
igrp = 0
ind = (1, 2, 3, 4)
ifrq = -1
iwt = -1
means = empty(0)  # pass as empty numpy array

# Retrieve the Fisher iris data set
x = dataSets(3)

# Initialize
xIndices = {}
xIndices['igrp'] = igrp
xIndices['ind'] = ind
xIndices['ifrq'] = ifrq
xIndices['iwt'] = iwt
cov = pooledCovariances(0, nvar, x, n_groups,
                        ido=1,
                        xIndices=xIndices)

# Add 10 rows at a time
for i in range(0, 15):
    cov = pooledCovariances(10, nvar, x[i * 10:], n_groups,
                            ido=2,
                            xIndices=xIndices)

# Calculate cov
cov = pooledCovariances(0, nvar, x, n_groups,
                        ido=3,
                        xIndices=xIndices,
                        means=means)

writeMatrix("Pooled Covariance Matrix", cov)
writeMatrix("Means", means)

Output

 
              Pooled Covariance Matrix
             1            2            3            4
1       0.2650       0.0927       0.1675       0.0384
2       0.0927       0.1154       0.0552       0.0327
3       0.1675       0.0552       0.1852       0.0427
4       0.0384       0.0327       0.0427       0.0419
 
                        Means
             1            2            3            4
1        5.006        3.428        1.462        0.246
2        5.936        2.770        4.260        1.326
3        6.588        2.974        5.552        2.026

Warning Errors

IMSLS_OBSERVATION_IGNORED In call #, row # of the matrix “x” has group number = #. The group number must be between 1 and #, the number of groups. This observation will be ignored.

Fatal Errors

IMSLS_BAD_IDO_4 “ido” = #. Initial allocations must be performed by making a call to pooledCovariances with “ido” = 1.
IMSLS_BAD_IDO_5 “ido” = #. A new analysis may not begin until the previous analysis is terminated by a call to pooledCovariances with “ido” equal to 3.