pooledCovariances¶
Compute a pooled variance-covariance from the observations.
Synopsis¶
pooledCovariances (nRows, nVariables, x, nGroups)
Required Argument¶
- int
nRows
(Input) - Number of rows (observations) in the input matrix
x
. - int
nVariables
(Input) - Number of variables to be used in computing the covariance matrix.
- float
x
(Input) - A
nRows
×nVariables
+ 1 matrix containing the data. The firstnVariables
columns correspond to the variables, and the last column (columnnVariables
) must contain the group numbers. - int
nGroups
(Input) - Number of groups in the data.
Return Value¶
Matrix of size nVariables
by nVariables
containing the matrix of
covariances.
Optional Arguments¶
xIndices
, intigrp
, intind[]
, intifrq
, intiwt
(Input)Each of the four arguments contains indices indicating column numbers of
x
in which particular types of data are stored.Parameter
igrp
contains the index for the column ofx
in which the group numbers are stored.Parameter
ind
contains the indices of the variables to be used in the analysis.Parameters
ifrq
andiwt
contain the column numbers ofx
in which the frequencies and weights, respectively, are stored. Setifrq
= −1 if there will be no column for frequencies. Setiwt
= −1 if there will be no column for weights. Weights are rounded to the nearest integer. Negative weights are not allowed.Defaults:
igrp
=nVariables
,ind
[ ] = 0, 1, …,nVariables
− 1,ifrq
= −1, andiwt
= −1ido
, int (Input)Processing option.
ido
Action 0 This is the only invocation; all the data are input at once. (Default) 1 This is the first invocation with this data; additional calls will be made. Initialization and updating for the nRows
observations ofx
will be performed.2 This is an intermediate invocation; updating for the nRows
observations ofx
will be performed.3 All statistics are updated for the nRows
observations. The covariance matrix computed.Default:
ido
= 0
rowsAdd
(Input)
or
rowsDelete
(Input)- By default (or if
rowsAdd
is specified), the observations inx
are added into the analysis. IfrowsDelete
is specified, the observations are deleted from the analysis. Ifido
= 0, these optional arguments are ignored (data is always added if there is only one invocation). groupCounts
(Output)- An integer array of length
nGroups
containing the number of observations in each group. ArraygroupCounts
is updated whenido
is equal to 0, 1, or 2. sumWeights
(Output)- An array of length
nGroups
containing the sum of the weights times the frequencies in the groups. means
(Output)- An array of size
nGroups
×nVariables
. The i-th row ofmeans
contains the group i variable means. u
(Output)- An array of size
nVariables
×nVariables
containing the lower matrix U, the lower triangular for the pooled sample cross-products matrix. U is computed from the pooled sample covariance matrix, S (See the Description section below), as \(S=U^T U\). nRowsMissing
(Output)- Number of rows of data encountered in calls to
pooledCovariances
containing missing values (NaN) for any of the variables used.
Description¶
Function pooledCovariances
computes the pooled variance-covariance
matrix from a matrix of observations. The within-groups means are also
computed. Listwise deletion of missing values is assumed so that all
observations used are complete; in any row of x
, if any element of the
observation is missing, the row is not used. Function pooledCovariances
should be used whenever the user suspects that the data has been sampled
from populations with different means but identical variance-covariance
matrices. If these assumptions cannot be made, a different
variance-covariance matrix should be estimated within each group.
By default, all observations are processed in one call to
pooledCovariances
. The computations are the same as if
pooledCovariances
were consecutively called with ido
equal to 1, 2,
and 3. For brevity, the following discusses the computations with ido
>
0.
When ido
= 1 variables are initialized, workspace is allocated and input
variables are checked for errors.
If nRows
≠ 0 (for any value of ido
), the group observation totals,
\(T_i\), for \(i=1,\ldots,g\), where g is the number of groups, are
updated for the nRows
observations in x
. The group totals are
computed as:
where \(w_{ij}\) is the observation weight, \(x_{ij}\) is the j-th observation in the i-th group, and \(f_{ij}\) is the observation frequency.
Modified Givens rotations are used in computed the Cholesky decomposition of the pooled sums of squares and crossproducts matrix. (Golub and Van Loan 1983).
The group means and the pooled sample covariance matrix S are computed
from the intermediate results when ido
= 3. These quantities are defined
by
Examples¶
Example 1¶
The following example computes a pooled variance-covariance matrix. The last column of the data set is the group indicator.
from numpy import *
from pyimsl.stat.pooledCovariances import pooledCovariances
from pyimsl.stat.writeMatrix import writeMatrix
nvar = 2
nobs = 6
n_groups = 2
x = [[2.2, 5.6, 1],
[3.4, 2.3, 1],
[1.2, 7.8, 1],
[3.2, 2.1, 2],
[4.1, 1.6, 2],
[3.7, 2.2, 2]]
cov = pooledCovariances(nobs, nvar, x, n_groups)
writeMatrix("Pooled Covariance Matrix", cov)
Output¶
Pooled Covariance Matrix
1 2
1 0.708 -1.575
2 -1.575 3.883
Example 2¶
The following example computes a pooled variance-covariance matrix for the
Fisher iris data. To illustrate the use of the ido
argument, multiple
calls to pooledCovariances
are made.
The first column of data is the group indicator, requiring either a
permutation of the matrix or the use of the xIndices
optional keyword.
This example chooses the keyword method.
from numpy import *
from pyimsl.stat.dataSets import dataSets
from pyimsl.stat.pooledCovariances import pooledCovariances
from pyimsl.stat.writeMatrix import writeMatrix
nobs = 150
nvar = 4
n_groups = 3
igrp = 0
ind = (1, 2, 3, 4)
ifrq = -1
iwt = -1
means = empty(0) # pass as empty numpy array
# Retrieve the Fisher iris data set
x = dataSets(3)
# Initialize
xIndices = {}
xIndices['igrp'] = igrp
xIndices['ind'] = ind
xIndices['ifrq'] = ifrq
xIndices['iwt'] = iwt
cov = pooledCovariances(0, nvar, x, n_groups,
ido=1,
xIndices=xIndices)
# Add 10 rows at a time
for i in range(0, 15):
cov = pooledCovariances(10, nvar, x[i * 10:], n_groups,
ido=2,
xIndices=xIndices)
# Calculate cov
cov = pooledCovariances(0, nvar, x, n_groups,
ido=3,
xIndices=xIndices,
means=means)
writeMatrix("Pooled Covariance Matrix", cov)
writeMatrix("Means", means)
Output¶
Pooled Covariance Matrix
1 2 3 4
1 0.2650 0.0927 0.1675 0.0384
2 0.0927 0.1154 0.0552 0.0327
3 0.1675 0.0552 0.1852 0.0427
4 0.0384 0.0327 0.0427 0.0419
Means
1 2 3 4
1 5.006 3.428 1.462 0.246
2 5.936 2.770 4.260 1.326
3 6.588 2.974 5.552 2.026
Warning Errors¶
IMSLS_OBSERVATION_IGNORED |
In call #, row # of the matrix
“x ” has group number = #. The
group number must be between 1 and #,
the number of groups. This
observation will be ignored. |
Fatal Errors¶
IMSLS_BAD_IDO_4 |
“ido” = #. Initial allocations must be
performed by making a call to
pooledCovariances with “ido” = 1. |
IMSLS_BAD_IDO_5 |
“ido” = #. A new analysis may not begin until
the previous analysis is terminated by a call
to pooledCovariances with “ido” equal to 3. |