CORVC
Computes the variance-covariance or correlation matrix.
Required Arguments
NVAR — Number of variables. (Input)
The weight or frequency variables, if used, are not counted in NVAR.
X — ∣NROW∣ by NVAR + m matrix containing the data, where m is 0, 1, or 2 depending on whether any column(s) of X correspond to weights and/or frequencies. (Input)
COV — NVAR by NVAR matrix containing either the correlation matrix (possibly with the standard deviations on the diagonal), the variance-covariance matrix, or the corrected sums of squares and crossproducts matrix, as controlled by the COV option, ICOPT. (Output, if IDO = 0 or 1; input/output, if IDO = 2 or 3)
The elements of COV correspond to the columns of X, except for the columns of X containing weights or frequencies (see XMEAN).
Optional Arguments
IDO — Processing option. (Input)
Default: IDO = 0.
IDO |
Action |
0 |
This is the only invocation of CORVC for this data set, and all the data are input at once. |
1 |
This is the first invocation, and additional calls to CORVC will be made. Initialization and updating for the NROW observations are performed. The means (in XMEAN) are output correctly, but the quantities output in COV are intermediate results. |
2 |
This is an intermediate invocation of CORVC, and updating for the NROW observations is performed. |
3 |
This is the final invocation of this routine. If NROW is not zero, updating is performed. The wrap-up computations for COV are performed. |
It is possible to call CORVC twice in succession with IDO = 3 in order to first compute covariances (ICOPT = 1) and then compute correlations (ICOPT = 2 or 3). This ability is most important when pairwise deletion of missing values is used (MOPT = 3). The workspace arrays (or the workspace) must not be altered in between calls.
NROW — The absolute value of NROW is the number of rows of data currently input in X. (Input)
Default: NROW = size (X,1).
NROW may be positive, zero, or negative. Negative NROW means that the ‑NROW rows of data are to be deleted from (most aspects of) the analysis. This should be done only if IDO is 2 or 3 and the wrap-up computations for COV have not been performed. When a negative value is input for NROW, it is assumed that each of the ‑NROW rows of X has been input (with positive NROW ) in previous invocations of CORVC. Use of negative values of NROW should be made with care since it is possible that a constant variable in the remaining data will not be recognized as such.
LDX — Leading dimension of X exactly as specified in the dimension statement in the calling program. (Input)
Default: LDX = size (X,1).
IFRQ — Frequency option. (Input)
IFRQ = 0 means that all frequencies are 1.0. For positive IFRQ, column IFRQ of X contains the frequencies.
Default: IFRQ = 0.
IWT — Weighting option. (Input)
IWT = 0 means that all weights are 1.0. For positive IWT, column IWT of X contains the weights. Observations with zero weight are counted as observations in the frequencies, but do not contribute to the means, variances, covariances, or correlations. Observations with negative weights are missing.
Default: IWT = 0.
MOPT — Missing value option. (Input)
NaN (not a number) is interpreted as the missing value code, and any value in X equal to NaN is excluded from the computations. If MOPT is positive, various pairwise exclusion methods are used. See routine AMACH/DMACH in the Reference Material.
Default: MOPT = 0.
MOPT |
Action |
0 |
The exclusion is listwise. (The entire row of X is excluded if any of the values of the row is equal to the missing value code.) |
1 |
Raw crossproducts are computed from all valid pairs and means, and variances are computed from all valid data on the individual variables. Corrected crossproducts, covariances and correlations are computed using these quantities. |
2 |
Raw crossproducts, means and variances are computed as in the case of MOPT = 1. However, corrected crossproducts and covariances are computed only from the valid pairs of data. Correlations are computed using these covariances and the variances from all valid data. |
3 |
Raw crossproducts, means, variances, and covariances are computed as in the case of MOPT = 2. Correlations are computed using these covariances, but the variances used are computed only from the valid pairs of data. |
ICOPT — COV option. (Input)
Default: ICOPT = 0.
ICOPT |
Action |
0 |
COV contains the variance-covariance matrix. |
1 |
COV contains the corrected sums of squares and crossproducts matrix. |
2 |
COV contains the correlation matrix. |
3 |
COV contains the correlation matrix, except for the diagonal elements, which are the standard deviations. |
XMEAN — Vector of length NVAR containing the variable means. (Output, if IDO = 0 or 1; input/output, if IDO = 2 or 3)
The elements of XMEAN correspond to the columns of X, except that if weights and/or frequencies are used, the elements of XMEAN beyond the IWT or IFRQ element are shifted relative to the columns of X.
LDCOV — Leading dimension of COV exactly as specified in the dimension statement in the calling program. (Input)
Default: LDCOV = size (COV,1).
INCD — Incidence matrix. (Output, if IDO = 0 or 1; input/output, if IDO = 2 or 3)
If MOPT is zero, INCD is 1 by 1, and contains the number of valid observations. If MOPT is positive, INCD is NVAR by NVAR and contains the numbers of pairs of valid observations that are used in calculating the crossproducts for COV.
LDINCD — Leading dimension of INCD exactly as specified in the dimension statement in the calling program. (Input)
Default: LDINCD = size(INCD,1).
NOBS — Total number of observations (that is, the total of the frequencies). (Output, if IDO = 0 or 1; input/output, if IDO = 2 or 3)
If MOPT = 0, observations with missing values are not included in NOBS. For other values of MOPT, all observations are included except for observations with missing values for the weight or the frequency.
NMISS — Total number of observations that contain any missing values. (Output, if IDO = 0 or 1; input/output, if IDO = 2 or 3)
SUMWT — Sum of the weights of all observations that are processed. (Output, if IDO = 0, or 1; input/output, if IDO = 2 or 3)
If MOPT = 0, observations with missing values are not included in SUMWT. For other values of MOPT, all observations are included except for observations with missing values for the weight or the frequency.
FORTRAN 90 Interface
Generic: CALL CORVC (NVAR, X, COV [, …])
Specific: The specific interface names are S_CORVC and D_CORVC.
FORTRAN 77 Interface
Single: CALL CORVC (IDO, NROW, NVAR, X, LDX, IFRQ, IWT, MOPT, ICOPT, XMEAN, COV, LDCOV, INCD, LDINCD, NOBS, NMISS, SUMWT)
Double: The double precision name is DCORVC.
Description
Routine CORVC computes estimates of correlations, covariances, or sums of squares and crossproducts for a data matrix X. Weights and frequencies are allowed but not required. Also allowed are listwise or pairwise deletion of missing values. Routine CORVC is an “IDO routine,” so it may be called with all of the data in one invocation, or it may be called in several invocations with some (or none) of the data input during each call. By setting NROW to a negative integer, observations that have previously been added to the covariance/correlation statistics may be deleted from the statistics. Exercise care with this option, however, since the program may not be able to detect constant variables when negative NROW is used.
The means, (corrected) sums of squares, and (corrected) sums of crossproducts are computed using the method of provisional means. Let
denote the mean based upon i observations for the k-th variable, fi denote the frequency of the i-th observation, wi denote the weight of the i-th observation, and let cjki denote the sum of crossproducts (or sum of squares if j = k) based upon i observations. Then, the method of provisional means finds new means and sums of crossproducts as follows:
The means and crossproducts are initialized as:
where p denotes the number of variables. Letting xk(i+1) denote the k-th variable on observation i + 1, each new observation leads to the following updates for
and cjki using update constant ri+1:
If there is no weight variable, weights of 1.0 are used. If there is no frequency column, frequencies of 1.0 are used. Means and variances are computed based upon all of the valid data for each variable or, if required, based upon all of the valid data for each pair of variables.
Comments
1. Workspace may be explicitly provided, if desired, by use of C2RVC/DC2RVC.The reference is:
CALL C2RVC (IDO, NROW, NVAR, X, LDX, IFRQ, IWT, MOPT, ICOPT, XMEAN, COV, LDCOV, INCD, LDINCD, NOBS, NMISS, SUMWT, WK)
The additional argument is:
WK — Workspace of the length specified in the table below. WK should not be changed between calls to C2RVC.
The workspace may contain statistics of interest. Let
m = NVAR
k = m(m + 1)/2
Statistics that are stored in the workspace that are part of symmetric matrices are stored in symmetric storage mode, i.e., only the lower triangular elements are stored. The workspace utilization is :
MOPT |
IWT |
Start |
Length |
Contents |
All |
All |
1 |
m |
Indicators of constant data |
All |
All |
m + 1 |
m |
First nonmissing data |
0 |
All |
2m+1 |
m |
Deviation from temporary mean |
0 |
Positive |
3m + 1 |
1 |
Sum of weights |
1, 2 |
All |
2m + 1 |
m2 |
Pairwise means |
1, 2 |
Positive |
2m + m2 + 1 |
k |
Pairwise sums of weights |
3 |
All |
2m + 1 |
m2 |
Pairwise means |
3 |
0 |
2m + m2 + 1 |
m2 |
Pairwise sums of products |
3 |
Positive |
2m + m2 + 1 |
k |
Pairwise sums of weights |
3 |
Positive |
2m + k + m2 + 1 |
m2 |
Pairwise sums of products |
2. Informational errors
Type |
Code |
Description |
3 |
12 |
The sum of the weights is zero. The means, variance and covariances are set to NaN. |
3 |
13 |
The sum of the weights is zero. The means and correlations are set to NaN. |
3 |
14 |
Correlations are requested but the observations on a variable are constant. The pertinent correlations are set to NaN. |
3 |
15 |
Variances and covariances are requested but fewer than two valid observations are present for some variables. The corresponding variances or covariances are set to NaN. |
3 |
16 |
Pairwise correlations are requested but the observations on a variable are constant. The pertinent correlations are set to NaN. |
3 |
17 |
Correlations are requested but fewer than two valid observations are present for some variables. The corresponding variances or covariances are set to NaN. |
4 |
10 |
More observations have been deleted than were originally entered. |
4 |
11 |
More observations have been deleted from COV(i, j) than were originally entered. INCD(i, j) is less than zero. |
4 |
18 |
Different observations have been deleted from COV(i, j) than were originally entered. COV(i, j) is less than zero. |
Usage Notes
In CORVC, each observation xki with weight wi is assumed to have mean μk and variance
With these assumptions, CORVC uses the following definition of a sample mean:
where nr is the number of cases. The following formula defines the sample covariance, sjk, between variables j and k:
The sample correlation between variables j and k, rjk, is defined as:
Examples
Example 1
The first example illustrates the use of CORVC when inputing all of the data at once. The first 50 observations in the Fisher iris data (see routine GDATA, Chapter 19, “Utilities”) are used. Note in this example that the first variable is constant over the first 50 observations.
USE GDATA_INT
USE UMACH_INT
USE CORVC_INT
USE WRRRN_INT
USE WRIRN_INT
IMPLICIT NONE
INTEGER LDCOV, LDINCD, LDX, NVAR
PARAMETER (LDCOV=5, LDINCD=1, LDX=150, NVAR=5)
!
INTEGER INCD(LDINCD,1), NMISS, NOBS, NOUT, NROW, NV
REAL COV(LDCOV,NVAR), SUMWT, X(LDX,NVAR), XMEAN(NVAR)
!
CALL GDATA (3, X, NOBS, NV)
!
CALL UMACH (2, NOUT)
NROW = 50
!
CALL CORVC (NVAR, X, COV, NROW=NROW, XMEAN=XMEAN, INCD=INCD, &
NOBS=NOBS, NMISS=NMISS, SUMWT=SUMWT)
!
CALL WRRRN ('XMEAN', XMEAN, 1, NVAR, 1, 0)
CALL WRRRN ('COV', COV)
CALL WRIRN ('INCD', INCD)
WRITE (NOUT,*) ' NOBS = ', NOBS, ' NMISS = ', NMISS, ' SUMWT = ', &
SUMWT
END
Output
XMEAN
1 2 3 4 5
1.000 5.006 3.428 1.462 0.246
COV
1 2 3 4 5
1 0.0000 0.0000 0.0000 0.0000 0.0000
2 0.0000 0.1242 0.0992 0.0164 0.0103
3 0.0000 0.0992 0.1437 0.0117 0.0093
4 0.0000 0.0164 0.0117 0.0302 0.0061
5 0.0000 0.0103 0.0093 0.0061 0.0111
INCD
50
NOBS = 50 NMISS = 0 SUMWT = 50.0000
Example 2
In the second example, the IDO option is used. After the initialization step in which IDO = 1, the first 53 observations in the Fisher iris data are input, one observation at a time. The last three observations input are then deleted from the covariances by setting NROW = ‑ 1. Finally, the wrap-up step is accomplished by calling CORVC with IDO = 3. The output is identical to the output above.
USE IMSL_LIBRARIES
IMPLICIT NONE
INTEGER LDCOV, LDINCD, LDX, LDY, NVAR
PARAMETER (LDCOV=5, LDINCD=1, LDX=150, LDY=1, NVAR=5)
!
INTEGER I, IDO, INCD(LDINCD,1), NMISS, NOBS, NOUT, NROW, NV
REAL COV(LDCOV,NVAR), SUMWT, X(LDX,NVAR), XMEAN(NVAR), &
Y(LDY,NVAR)
!
CALL GDATA (3, X, NOBS, NV)
!
CALL UMACH (2, NOUT)
!
!
IDO = 1
NROW = 0
! Initialization
CALL CORVC (NVAR, Y, COV, IDO=IDO, NROW=NROW, XMEAN=XMEAN, &
INCD=INCD, NOBS=NOBS, NMISS=NMISS, SUMWT=SUMWT)
!
IDO = 2
NROW = 1
! Add the observations
DO 10 I=1, 53
CALL SCOPY (NVAR, X(I:,1), LDX, Y(1:,1), 1)
CALL CORVC (NVAR, Y, COV, IDO=IDO, NROW=NROW, XMEAN=XMEAN, &
INCD=INCD, NOBS=NOBS, NMISS=NMISS, SUMWT=SUMWT)
10 CONTINUE
! Delete the last 3 added
NROW = -1
DO 20 I=51, 53
CALL SCOPY (NVAR, X(I:,1), LDX, Y(1:,1), 1)
CALL CORVC (NVAR, Y, COV, IDO=IDO, NROW=NROW, XMEAN=XMEAN, &
INCD=INCD, NOBS=NOBS, NMISS=NMISS, SUMWT=SUMWT)
20 CONTINUE
! Wrap-up
IDO = 3
NROW = 0
CALL CORVC (NVAR, Y, COV, IDO=IDO, NROW=NROW, XMEAN=XMEAN, INCD=INCD,&
NOBS=NOBS, NMISS=NMISS, SUMWT=SUMWT)
CALL WRRRN ('XMEAN', XMEAN, 1, NVAR, 1)
CALL WRRRN ('COV', COV)
CALL WRIRN ('INCD', INCD)
WRITE (NOUT,*) ' NOBS = ', NOBS, ' NMISS = ', NMISS, ' SUMWT = ', &
SUMWT
END
Output
XMEAN
1 2 3 4 5
1.000 5.006 3.428 1.462 0.246
COV
1 2 3 4 5
1 0.0000 0.0000 0.0000 0.0000 0.0000
2 0.0000 0.1242 0.0992 0.0164 0.0103
3 0.0000 0.0992 0.1437 0.0117 0.0093
4 0.0000 0.0164 0.0117 0.0302 0.0061
5 0.0000 0.0103 0.0093 0.0061 0.0111
INCD
50
NOBS = 50 NMISS = 0 SUMWT = 50.0000