COVPL
Computes a pooled variance-covariance matrix from the observations.
Required Arguments
NROW — The absolute value of NROW is the number of rows of X that contain an observation. (Input)
NROW may be positive, zero, or negative. Negative NROW means that the ‑NROW rows of data are to be deleted from (most aspects of) the analysis. This should be done only if IDO is 2 or 3 and the wrap-up computations for COV have not been performed. When a negative value is input for NROW, it is assumed that each of the ‑NROW rows of X has been input (with positive NROW ) in previous invocations of CORVC. Use of negative values of NROW should be made with care since it is possible that a constant variable in the remaining data will not be recognized as such.
NVAR — Number of variables to be used in computing the covariance matrix. (Input)
The weight, frequency or group variables, if used, are not counted in NVAR.
X — ∣NROW∣ by NVAR + m matrix containing the data. (Input)
The number of columns of X that are used is NVAR + m, where m is 0, 1, 2, or 3 depending upon whether any columns in X contain frequencies, weights or group numbers.
NGROUP — Number of groups in the data. (Input)
COV — NVAR by NVAR matrix of covariances. (Output, if IDO = 0 or 1; input/ output, if IDO = 2 or 3)
Optional Arguments
IDO — Processing option. (Input)
Default: IDO = 0.
IDO | Action |
---|
0 | This is the only invocation of COVPL and all the data are input at once. |
1 | This is the first invocation of COVPL with this data, and additional calls will be made. Initialization of program variables and updating for the NROW observations are performed. |
2 | This is an intermediate invocation of COVPL, and updating for the NROW observations is performed. |
3 | All statistics are updated for the NROW observations. The covariance matrix is computed. |
NCOL — Number of columns in matrix X.
Default: NCOL = size (X,2).
LDX — Leading dimension of X exactly as specified in the dimension statement in the calling program. (Input)
Default: LDX = size (X,1).
IND — Vector of length NVAR containing the column numbers in X to be used in computing the covariance matrices. (Input)
By default: IND(I) = I.
IFRQ — Frequency option. (Input)
IFRQ = 0 means that all frequencies are 1.0. Positive IFRQ indicates that column number IFRQ of X contains the frequencies. All frequencies should be integer values. The NINT (nearest integer) function is used to obtain integer frequencies if this is not the case.
Default: IFRQ = 0.
IWT — Weighting option. (Input)
IWT = 0 means that all weights are 1.0. Positive IWT means that column IWT of X contains the weights. Negative weights are not allowed.
Default: IWT = 0.
IGRP — Column of X giving the group numbers. (Input)
If IGRP = 0, one group is assumed. If IGRP > 0, then column number IGRP of X contains the group number for the observation. Group numbers must be numbered 1, 2, …, NGROUP. The NINT function is used to get integer values for the group numbers.
Default: IGRP = NVAR + 1.
NI — Vector of length NGROUP containing the numbers of observations in the groups. (Output, if IDO = 0 or 1; input/output, if IDO = 2 or 3)
The i-th element of NI contains the number of observations in group i.
SWT — Vector of length NGROUP containing the sum of the weights times the frequencies in the groups. (Output, if IDO = 0 or 1; input/output, if IDO = 2 or 3)
XMEAN — NGROUP by NVAR matrix. (Output, if IDO = 0 or 1; input/output, if IDO = 2 or 3)
The i-th row of XMEAN contains the group i variable means.
LDXMEA — Leading dimension of XMEAN exactly as specified in the dimension statement in the calling program. (Input)
Default: LDXMEAN = size (XMEAN ,1).
LDCOV — Leading dimension of COV exactly as specified in the dimension statement of the calling program. (Input)
Default: LDCOV = size (COV,1).
NRMISS — Number of rows of data encountered in calls to COVPL containing missing values (NaN) for any of the variables used. (Output, if IDO = 0 or 1; input/ output, if IDO = 2 or 3)
FORTRAN 90 Interface
Generic: CALL COVPL (NROW, NVAR, X, NGROUP, COV [, …])
Specific: The specific interface names are S_COVPL and D_COVPL.
FORTRAN 77 Interface
Single: CALL COVPL (IDO, NROW, NVAR, NCOL, X, LDX, IND, IFRQ, IWT, NGROUP, IGRP, NI, SWT, XMEAN, LDXMEA, COV, LDCOV, NRMISS)
Double: The double precision name is DCOVPL.
Description
Routine COVPL computes the pooled variance-covariance matrix from a matrix of observations. The within-groups means are also computed. Listwise deletion of missing values is assumed so that all observations used are “complete”; in any row of X, if an element in the “list” IND, IGRP, IFRQ or IWT is missing, then the row is not used. Routine COVPL should be used whenever one suspects that the data has been sampled from populations with different means but identical variance-covariance matrices. If these assumptions cannot be made, a different variance-covariance matrix should be estimated within each group.
When IDO = 0, the same computations occur as if COVPL were consecutively called with IDO equal to 1, 2, and 3. For brevity, the following discusses the computations with IDO > 0.
When IDO = 1 variables are initialized, workspace is allocated, and input variables are checked for errors.
If NROW ≠ 0 (for any value of IDO), the group observation totals, Ti, for i = 1,…, g, where g is the number of groups, are updated for the ∣NROW∣ observations in X. The group totals are computed as:
where ωij is the observation weight, xij is the j-th observation in the i-th group, and fij is the observation frequency.
Modified Givens rotations (see routines SROTM and SROTMG in the IMSL MATH/LIBRARY) are used in computing the Cholesky decomposition of the pooled sums of squares and crossproducts matrix. The interested reader is referred to Golub and Van Loan (1983) for details.
The group means and the pooled sample covariance matrix S are computed from the intermediate results when IDO = 3. These quantities are defined by
Occasionally, the Cholesky factorization, such that S = UTU where U is lower triangular of the pooled sample cross-products matrix, may be desired. U may be computed from the output array COV, and the workspace array D returned in calls to C2VPL. The Cholesky factor U can be computed prior to calling C2VPL with IDO = 3 by multiplying the elements in the i-th row of COV by
If subsequent calls to C2VPL are to be made, COV must not be modified in computing U.
Comments
1. Workspace may be explicitly provided, if desired, by use of C2VPL/DC2VPL. The reference is:
CALL C2VPL (IDO, NROW, NVAR, NCOL, X, LDX, IND, IFRQ, IWT, NGROUP, IGRP, NI, SWT, XMEAN, LDXMEA, COV, LDCOV, NRMISS, D, OB, XVAL, DIF)
The additional arguments are as follows:
D — Real work vector of length NVAR.
OB — Real work vector of length NVAR.
XVAL — Real work vector of length NVAR * NGROUP.
DIF — Real work vector of length NVAR.
2. Informational error
Type | Code | Description |
---|
3 | 1 | The group number is not between 1 and NGROUP. The observation is ignored. |
Example
The following example computes a pooled variance-covariance matrix for the Fisher iris data (see routine
GDATA,
Chapter 19, “Utilities”). The first column in this data set is the group indicator. To illustrate the use of the
IDO argument, multiple calls to
COVPL are made.
! Specifications
USE GDATA_INT
USE COVPL_INT
USE UMACH_INT
USE WRRRN_INT
IMPLICIT NONE
INTEGER IFRQ, IGRP, IWT, LDCOV, LDX, LDXMEA, NCOL, NGROUP, &
NROW, NVAR
PARAMETER (IFRQ=0, IGRP=1, IWT=0, LDX=150, NCOL=5, NGROUP=3, &
NROW=1, NVAR=4, LDCOV=NVAR, LDXMEA=NGROUP)
!
INTEGER I, IDO, IND(4), NI(NGROUP), NOBS, NOUT, NRMISS, NV
REAL COV(LDCOV,LDCOV), SWT(NGROUP), X(LDX,5), XMEAN(LDXMEA,NVAR)
!
DATA IND/2, 3, 4, 5/
!
CALL GDATA (3, X, NOBS, NV)
!
IDO = 1
CALL COVPL (0, NVAR, X, NGROUP, COV, IDO=IDO, IND=IND, IGRP=IGRP, &
NI=NI, SWT=SWT, XMEAN=XMEAN, NRMISS=NRMISS)
! Add the observations
IDO = 2
DO 10 I=1, NOBS
CALL COVPL (NROW, NVAR, X(I:, 1:NCOL), NGROUP, COV, IDO=IDO, &
IND=IND, IGRP=IGRP, NI=NI,SWT=SWT, XMEAN=XMEAN, NRMISS=NRMISS)
10 CONTINUE
! Summarize the statistics
IDO = 3
CALL COVPL (0, NVAR, X, NGROUP, COV, IDO=IDO, IND=IND, IGRP=IGRP, &
NI=NI,SWT=SWT, XMEAN=XMEAN, NRMISS=NRMISS)
!
CALL UMACH (2, NOUT)
WRITE (NOUT,*) ' NRMISS = ', NRMISS
CALL WRRRN ('XMEAN', XMEAN)
CALL WRRRN ('COV', COV)
END
Output
NRMISS = 0
XMEAN
1 2 3 4
1 5.006 3.428 1.462 0.246
2 5.936 2.770 4.260 1.326
3 6.588 2.974 5.552 2.026
COV
1 2 3 4
1 0.2650 0.0927 0.1675 0.0384
2 0.0927 0.1154 0.0552 0.0327
3 0.1675 0.0552 0.1852 0.0427
4 0.0384 0.0327 0.0427 0.0419