pooledCovariances¶
Compute a pooled variance-covariance from the observations.
Synopsis¶
pooledCovariances (nRows, nVariables, x, nGroups)
Required Argument¶
- int
nRows(Input) - Number of rows (observations) in the input matrix
x. - int
nVariables(Input) - Number of variables to be used in computing the covariance matrix.
- float
x(Input) - A
nRows×nVariables+ 1 matrix containing the data. The firstnVariablescolumns correspond to the variables, and the last column (columnnVariables) must contain the group numbers. - int
nGroups(Input) - Number of groups in the data.
Return Value¶
Matrix of size nVariables by nVariables containing the matrix of
covariances.
Optional Arguments¶
xIndices, intigrp, intind[], intifrq, intiwt(Input)Each of the four arguments contains indices indicating column numbers of
xin which particular types of data are stored.Parameter
igrpcontains the index for the column ofxin which the group numbers are stored.Parameter
indcontains the indices of the variables to be used in the analysis.Parameters
ifrqandiwtcontain the column numbers ofxin which the frequencies and weights, respectively, are stored. Setifrq= −1 if there will be no column for frequencies. Setiwt= −1 if there will be no column for weights. Weights are rounded to the nearest integer. Negative weights are not allowed.Defaults:
igrp=nVariables,ind[ ] = 0, 1, …,nVariables− 1,ifrq= −1, andiwt= −1ido, int (Input)Processing option.
idoAction 0 This is the only invocation; all the data are input at once. (Default) 1 This is the first invocation with this data; additional calls will be made. Initialization and updating for the nRowsobservations ofxwill be performed.2 This is an intermediate invocation; updating for the nRowsobservations ofxwill be performed.3 All statistics are updated for the nRowsobservations. The covariance matrix computed.Default:
ido= 0
rowsAdd (Input)
or
rowsDelete(Input)- By default (or if
rowsAddis specified), the observations inxare added into the analysis. IfrowsDeleteis specified, the observations are deleted from the analysis. Ifido= 0, these optional arguments are ignored (data is always added if there is only one invocation). groupCounts(Output)- An integer array of length
nGroupscontaining the number of observations in each group. ArraygroupCountsis updated whenidois equal to 0, 1, or 2. sumWeights(Output)- An array of length
nGroupscontaining the sum of the weights times the frequencies in the groups. means(Output)- An array of size
nGroups×nVariables. The i-th row ofmeanscontains the group i variable means. u(Output)- An array of size
nVariables×nVariablescontaining the lower matrix U, the lower triangular for the pooled sample cross-products matrix. U is computed from the pooled sample covariance matrix, S (See the Description section below), as \(S=U^T U\). nRowsMissing(Output)- Number of rows of data encountered in calls to
pooledCovariancescontaining missing values (NaN) for any of the variables used.
Description¶
Function pooledCovariances computes the pooled variance-covariance
matrix from a matrix of observations. The within-groups means are also
computed. Listwise deletion of missing values is assumed so that all
observations used are complete; in any row of x, if any element of the
observation is missing, the row is not used. Function pooledCovariances
should be used whenever the user suspects that the data has been sampled
from populations with different means but identical variance-covariance
matrices. If these assumptions cannot be made, a different
variance-covariance matrix should be estimated within each group.
By default, all observations are processed in one call to
pooledCovariances. The computations are the same as if
pooledCovariances were consecutively called with ido equal to 1, 2,
and 3. For brevity, the following discusses the computations with ido >
0.
When ido = 1 variables are initialized, workspace is allocated and input
variables are checked for errors.
If nRows ≠ 0 (for any value of ido), the group observation totals,
\(T_i\), for \(i=1,\ldots,g\), where g is the number of groups, are
updated for the nRows observations in x. The group totals are
computed as:
where \(w_{ij}\) is the observation weight, \(x_{ij}\) is the j-th observation in the i-th group, and \(f_{ij}\) is the observation frequency.
Modified Givens rotations are used in computed the Cholesky decomposition of the pooled sums of squares and crossproducts matrix. (Golub and Van Loan 1983).
The group means and the pooled sample covariance matrix S are computed
from the intermediate results when ido = 3. These quantities are defined
by
Examples¶
Example 1¶
The following example computes a pooled variance-covariance matrix. The last column of the data set is the group indicator.
from numpy import *
from pyimsl.stat.pooledCovariances import pooledCovariances
from pyimsl.stat.writeMatrix import writeMatrix
nvar = 2
nobs = 6
n_groups = 2
x = [[2.2, 5.6, 1],
[3.4, 2.3, 1],
[1.2, 7.8, 1],
[3.2, 2.1, 2],
[4.1, 1.6, 2],
[3.7, 2.2, 2]]
cov = pooledCovariances(nobs, nvar, x, n_groups)
writeMatrix("Pooled Covariance Matrix", cov)
Output¶
Pooled Covariance Matrix
1 2
1 0.708 -1.575
2 -1.575 3.883
Example 2¶
The following example computes a pooled variance-covariance matrix for the
Fisher iris data. To illustrate the use of the ido argument, multiple
calls to pooledCovariances are made.
The first column of data is the group indicator, requiring either a
permutation of the matrix or the use of the xIndices optional keyword.
This example chooses the keyword method.
from numpy import *
from pyimsl.stat.dataSets import dataSets
from pyimsl.stat.pooledCovariances import pooledCovariances
from pyimsl.stat.writeMatrix import writeMatrix
nobs = 150
nvar = 4
n_groups = 3
igrp = 0
ind = (1, 2, 3, 4)
ifrq = -1
iwt = -1
means = empty(0) # pass as empty numpy array
# Retrieve the Fisher iris data set
x = dataSets(3)
# Initialize
xIndices = {}
xIndices['igrp'] = igrp
xIndices['ind'] = ind
xIndices['ifrq'] = ifrq
xIndices['iwt'] = iwt
cov = pooledCovariances(0, nvar, x, n_groups,
ido=1,
xIndices=xIndices)
# Add 10 rows at a time
for i in range(0, 15):
cov = pooledCovariances(10, nvar, x[i * 10:], n_groups,
ido=2,
xIndices=xIndices)
# Calculate cov
cov = pooledCovariances(0, nvar, x, n_groups,
ido=3,
xIndices=xIndices,
means=means)
writeMatrix("Pooled Covariance Matrix", cov)
writeMatrix("Means", means)
Output¶
Pooled Covariance Matrix
1 2 3 4
1 0.2650 0.0927 0.1675 0.0384
2 0.0927 0.1154 0.0552 0.0327
3 0.1675 0.0552 0.1852 0.0427
4 0.0384 0.0327 0.0427 0.0419
Means
1 2 3 4
1 5.006 3.428 1.462 0.246
2 5.936 2.770 4.260 1.326
3 6.588 2.974 5.552 2.026
Warning Errors¶
IMSLS_OBSERVATION_IGNORED |
In call #, row # of the matrix
“x” has group number = #. The
group number must be between 1 and #,
the number of groups. This
observation will be ignored. |
Fatal Errors¶
IMSLS_BAD_IDO_4 |
“ido” = #. Initial allocations must be
performed by making a call to
pooledCovariances with “ido” = 1. |
IMSLS_BAD_IDO_5 |
“ido” = #. A new analysis may not begin until
the previous analysis is terminated by a call
to pooledCovariances with “ido” equal to 3. |