covariances

../../_images/OpenMp_27.png

Computes the sample variance-covariance or correlation matrix.

Synopsis

covariances (x)

Required Arguments

float x[[]] (Input)
Array of size nRows × nVariables containing the data.

Return Value

If no optional arguments are used, covariances returns an nVariables × nVariables array containing the sample variance-covariance matrix of the observations. The rows and columns of this array correspond to the columns of x.

Optional Arguments

missingValueMethod (Input)
Method used to exclude missing values in x from the computations, where NaN is interpreted as the missing value code. See function machine (Chapter 15, Utilities). The methods are as follows:
missingValueMethod Action
0 The exclusion is listwise. (The entire row of x is excluded if any of the values of the row is equal to the missing value code.)
1 Raw crossproducts are computed from all valid pairs and means, and variances are computed from all valid data on the individual variables. Corrected crossproducts, covariances, and correlations are computed using these quantities.
2 Raw crossproducts, means, and variances are computed as in the case of missingValueMethod = 1. However, corrected crossproducts and covariances are computed only from the valid pairs of data. Correlations are computed using these covariances and the variances from all valid data.
3 Raw crossproducts, means, variances, and covariances are computed as in the case of missingValueMethod = 2. Correlations are computed using these covariances, but the variances used are computed from the valid pairs of data.
incidenceMatrix (Output)
An array containing the incidence matrix. If missingValueMethod is 0, incidenceMatrix is 1 × 1 and contains the number of valid observations; otherwise, incidenceMatrix is nVariables × nVariables and contains the number of pairs of valid observations used in calculating the crossproducts for covariance.
nObservations (Output)
Sum of the frequencies. If missingValueMethod is 0, observations with missing values are not included in nObservations; otherwise, all observations are included except for observations with missing values for the weight or the frequency.

varianceCovarianceMatrix

or

correctedSscpMatrix

or

correlationMatrix

or

stdevCorrelationMatrix
Exactly one of these options can be used to specify the type of matrix to be computed.
Keyword Type of Matrix
varianceCovarianceMatrix variance-covariance matrix (default)
correctedSscpMatrix corrected sums of squares and crossproducts matrix
correlationMatrix correlation matrix
stdevCorrelationMatrix correlation matrix except for the diagonal elements which are the standard deviations
means (Output)
The array containing the means of the variables in x. The components of the array correspond to the columns of x.
frequencies, float[] (Input)

Array of length nObservations containing the frequency for each observation.

Default: frequencies [ ] = 1

weights, float[] (Input)

Array of length nObservations containing the weight for each observation.

Default: weights [ ] = 1

sumWeights (Output)
Sum of the weights of all observations. If missingValueMethod is equal to 0, observations with missing values are not included in sumWeights. Otherwise, all observations are included except for observations with missing values for the weight or the frequency.
nRowsMissing (Output)
Total number of observations that contain any missing values (NaN).

Description

Function covariances computes estimates of correlations, covariances, or sums of squares and crossproducts for a data matrix x. Weights and frequencies are allowed but not required.

The means, (corrected) sums of squares, and (corrected) sums of crossproducts are computed using the method of provisional means. Let \(x_{ki}\) denote the mean based on i observations for the k-th variable, \(f_i\) denote the frequency of the i-th observation, \(w_i\) denote the weight of the i-th observations, and \(c_{jki}\) denote the sum of crossproducts (or sum of squares if j = k) based on i observations. Then the method of provisional means finds new means and sums of crossproducts as shown in the example below.

The means and crossproducts are initialized as follows:

\[x_{k0} = 0.0 \text{ for } k = 1, \ldots, p\]
\[c_{jk0} = 0.0 \text{ for } j, k = 1, \ldots, p\]

where p denotes the number of variables. Letting \(x_{k,i+1}\) denote the k-th variable of observation i + 1, each new observation leads to the following updates for \(x_{ki}\) and \(c_{jki}\) using the update constant \(r_{i+1}\):

\[r_{i+1} = \frac{f_{i+1} w_{i+1}}{\sum\limits_{l=1}^{i+1} f_l w_l}\]
\[\overline{x}_{k,i+1} = \overline{x}_{ki} + \left(x_{k,i+1} - \overline{x}_{ki}\right) r_{i+1}\]
\[c_{jk,i+1} = c_{jki} + f_{i+1} w_{i+1} \left(x_{j,i+1} - \overline{x}_{ji}\right) \left(x_{k,i+1} - \overline{x}_{ki}\right)\left(1 - r_{i+1}\right)\]

The default value for weights and frequencies is 1. Means and variances are computed based on the valid data for each variable or, if required, based on all the valid data for each pair of variables.

Usage Notes

Function covariances defines a sample mean by

\[\overline{x}_k = \frac {\sum\limits_{i=1}^{n} f_i w_i x_{ki}} {\sum\limits_{i=1}^{n_r} f_i w_i}\]

where n is the number of observations.

The following formula defines the sample covariance, \(s_{jk}\), between variables j and k:

\[s_{jk} = \frac {\sum\limits_{i=1}^{n} f_i w_i \left(x_{ji} - \overline{x}_j\right) \left(x_{ki} - \overline{x}_k\right)} {\sum\limits_{i=1}^{n} f_i-1}\]

The sample correlation between variables j and k, \(r_{jk}\), is defined as follows:

\[r_{jk} = \frac{s_{jk}}{\sqrt{s_{jj}s_{kk}}}\]

Examples

Example 1

This example illustrates the use of covariances for the first 50 observations in the Fisher iris data (Fisher 1936). Note that the first variable is constant over the first 50 observations.

from numpy import *
from pyimsl.stat.covariances import covariances
from pyimsl.stat.writeMatrix import writeMatrix

x = [[1.0, 5.1, 3.5, 1.4, .2], [1.0, 4.9, 3.0, 1.4, .2],
     [1.0, 4.7, 3.2, 1.3, .2], [1.0, 4.6, 3.1, 1.5, .2],
     [1.0, 5.0, 3.6, 1.4, .2], [1.0, 5.4, 3.9, 1.7, .4],
     [1.0, 4.6, 3.4, 1.4, .3], [1.0, 5.0, 3.4, 1.5, .2],
     [1.0, 4.4, 2.9, 1.4, .2], [1.0, 4.9, 3.1, 1.5, .1],
     [1.0, 5.4, 3.7, 1.5, .2], [1.0, 4.8, 3.4, 1.6, .2],
     [1.0, 4.8, 3.0, 1.4, .1], [1.0, 4.3, 3.0, 1.1, .1],
     [1.0, 5.8, 4.0, 1.2, .2], [1.0, 5.7, 4.4, 1.5, .4],
     [1.0, 5.4, 3.9, 1.3, .4], [1.0, 5.1, 3.5, 1.4, .3],
     [1.0, 5.7, 3.8, 1.7, .3], [1.0, 5.1, 3.8, 1.5, .3],
     [1.0, 5.4, 3.4, 1.7, .2], [1.0, 5.1, 3.7, 1.5, .4],
     [1.0, 4.6, 3.6, 1.0, .2], [1.0, 5.1, 3.3, 1.7, .5],
     [1.0, 4.8, 3.4, 1.9, .2], [1.0, 5.0, 3.0, 1.6, .2],
     [1.0, 5.0, 3.4, 1.6, .4], [1.0, 5.2, 3.5, 1.5, .2],
     [1.0, 5.2, 3.4, 1.4, .2], [1.0, 4.7, 3.2, 1.6, .2],
     [1.0, 4.8, 3.1, 1.6, .2], [1.0, 5.4, 3.4, 1.5, .4],
     [1.0, 5.2, 4.1, 1.5, .1], [1.0, 5.5, 4.2, 1.4, .2],
     [1.0, 4.9, 3.1, 1.5, .2], [1.0, 5.0, 3.2, 1.2, .2],
     [1.0, 5.5, 3.5, 1.3, .2], [1.0, 4.9, 3.6, 1.4, .1],
     [1.0, 4.4, 3.0, 1.3, .2], [1.0, 5.1, 3.4, 1.5, .2],
     [1.0, 5.0, 3.5, 1.3, .3], [1.0, 4.5, 2.3, 1.3, .3],
     [1.0, 4.4, 3.2, 1.3, .2], [1.0, 5.0, 3.5, 1.6, .6],
     [1.0, 5.1, 3.8, 1.9, .4], [1.0, 4.8, 3.0, 1.4, .3],
     [1.0, 5.1, 3.8, 1.6, .2], [1.0, 4.6, 3.2, 1.4, .2],
     [1.0, 5.3, 3.7, 1.5, .2], [1.0, 5.0, 3.3, 1.4, .2]]

# Perform analysis
correlations = covariances(x)

# Print results
writeMatrix("The default case: variances/covariances",
            correlations, printUpper=True)

Output

 
              The default case: variances/covariances
             1            2            3            4            5
1       0.0000       0.0000       0.0000       0.0000       0.0000
2                    0.1242       0.0992       0.0164       0.0103
3                                 0.1437       0.0117       0.0093
4                                              0.0302       0.0061
5                                                           0.0111

Example 2

This example, which uses the first 50 observations in the Fisher iris data, illustrates the use of optional arguments.

from numpy import *
from pyimsl.stat.covariances import covariances
from pyimsl.stat.writeMatrix import writeMatrix

n_variables = 5
means = []
x = [[1.0, 5.1, 3.5, 1.4, .2], [1.0, 4.9, 3.0, 1.4, .2],
     [1.0, 4.7, 3.2, 1.3, .2], [1.0, 4.6, 3.1, 1.5, .2],
     [1.0, 5.0, 3.6, 1.4, .2], [1.0, 5.4, 3.9, 1.7, .4],
     [1.0, 4.6, 3.4, 1.4, .3], [1.0, 5.0, 3.4, 1.5, .2],
     [1.0, 4.4, 2.9, 1.4, .2], [1.0, 4.9, 3.1, 1.5, .1],
     [1.0, 5.4, 3.7, 1.5, .2], [1.0, 4.8, 3.4, 1.6, .2],
     [1.0, 4.8, 3.0, 1.4, .1], [1.0, 4.3, 3.0, 1.1, .1],
     [1.0, 5.8, 4.0, 1.2, .2], [1.0, 5.7, 4.4, 1.5, .4],
     [1.0, 5.4, 3.9, 1.3, .4], [1.0, 5.1, 3.5, 1.4, .3],
     [1.0, 5.7, 3.8, 1.7, .3], [1.0, 5.1, 3.8, 1.5, .3],
     [1.0, 5.4, 3.4, 1.7, .2], [1.0, 5.1, 3.7, 1.5, .4],
     [1.0, 4.6, 3.6, 1.0, .2], [1.0, 5.1, 3.3, 1.7, .5],
     [1.0, 4.8, 3.4, 1.9, .2], [1.0, 5.0, 3.0, 1.6, .2],
     [1.0, 5.0, 3.4, 1.6, .4], [1.0, 5.2, 3.5, 1.5, .2],
     [1.0, 5.2, 3.4, 1.4, .2], [1.0, 4.7, 3.2, 1.6, .2],
     [1.0, 4.8, 3.1, 1.6, .2], [1.0, 5.4, 3.4, 1.5, .4],
     [1.0, 5.2, 4.1, 1.5, .1], [1.0, 5.5, 4.2, 1.4, .2],
     [1.0, 4.9, 3.1, 1.5, .2], [1.0, 5.0, 3.2, 1.2, .2],
     [1.0, 5.5, 3.5, 1.3, .2], [1.0, 4.9, 3.6, 1.4, .1],
     [1.0, 4.4, 3.0, 1.3, .2], [1.0, 5.1, 3.4, 1.5, .2],
     [1.0, 5.0, 3.5, 1.3, .3], [1.0, 4.5, 2.3, 1.3, .3],
     [1.0, 4.4, 3.2, 1.3, .2], [1.0, 5.0, 3.5, 1.6, .6],
     [1.0, 5.1, 3.8, 1.9, .4], [1.0, 4.8, 3.0, 1.4, .3],
     [1.0, 5.1, 3.8, 1.6, .2], [1.0, 4.6, 3.2, 1.4, .2],
     [1.0, 5.3, 3.7, 1.5, .2], [1.0, 5.0, 3.3, 1.4, .2]]

# Perform analysis
correlations = covariances(x,
                           stdevCorrelationMatrix=True,
                           means=means)

# Print results
writeMatrix("Means\n", means[1:])
title = "Correlations with Standard Deviations on the Diagonal\n"
tmpCorrelations = correlations[1:, 1:]
writeMatrix(title, tmpCorrelations, printUpper=True)

Output

***
*** Warning error issued from IMSL function covariances:
*** Correlations are requested but the observations on variable 1 are constant.  The pertinent correlation coefficients are set to NaN (not a number).
***
 
                      Means

          1            2            3            4
      5.006        3.428        1.462        0.246
 
Correlations with Standard Deviations on the Diagonal

              1            2            3            4
 1       0.3525       0.7425       0.2672       0.2781
 2                    0.3791       0.1777       0.2328
 3                                 0.1737       0.3316
 4                                              0.1054

Warning Errors

IMSLS_CONSTANT_VARIABLE Correlations are requested, but the observations on one or more variables are constant. The corresponding correlations are set to NaN.
IMSLS_INSUFFICIENT_DATA Variances and covariances are requested, but fewer than two valid observations are present for a variable. The pertinent statistics are set to NaN
IMSLS_ZERO_SUM_OF_WEIGHTS_2 The sum of the weights is zero. The means, variances, and covariances are set to NaN
IMSLS_ZERO_SUM_OF_WEIGHTS_3 The sum of the weights is zero. The means and correlations are set to NaN
IMSLS_TOO_FEW_VALID_OBS_CORREL Correlations are requested, but fewer than two valid observations are present for a variable. The pertinent correlation coefficients are set to NaN