covariances

../../_images/OpenMP.png

Computes the sample variance-covariance or correlation matrix.

Synopsis

covariances (x)

Required Arguments

float x[[]] (Input)
Array of size nObservations × nVariables containing the matrix of data.

Return Value

If no optional arguments are used, covariances returns an nVariables × nVariables matrix containing the sample variance-covariance matrix of the observations. The rows and columns of this matrix correspond to the columns of x.

Optional Arguments

varianceCovarianceMatrix, or

correctedSscpMatrix, or

correlationMatrix, or

stdevCorrelationMatrix
Exactly one of these options can be used to specify the type of matrix to be computed.
Keyword Type of Matrix
varianceCovarianceMatrix variance-covariance matrix (default)
correctedSscpMatrix corrected sums of squares and crossproducts matrix
correlationMatrix correlation matrix
stdevCorrelationMatrix correlation matrix except for the diagonal elements which are the standard deviations
means (Output)
The array containing the means of the variables in x. The components of the array correspond to the columns of x.

Description

The function covariances computes estimates of correlations, covariances, or sums of squares and crossproducts for a data matrix x. The means, (corrected) sums of squares, and (corrected) sums of crossproducts are computed using the method of provisional means. Let

\[\overline{x}_{ki}\]

denote the mean based on i observations for the k-th variable, and let \(c_{jki}\) denote the sum of crossproducts (or sum of squares if j = k) based on i observations. Then, the method of provisional means finds new means and sums of crossproducts as follows:

The means and crossproducts are initialized as:

\[\begin{split}\begin{array}{l} \overline{x}_{k0} = 0.0 \phantom{...} k=1, \ldots, p \\ c_{jk0} = 0.0 \phantom{...} j,k=1, \ldots, p \\ \end{array}\end{split}\]

where p denotes the number of variables. Letting \(x_{k,i+1}\) denote the k-th variable on observation i + 1, each new observation leads to the following updates for

\[\overline{x}_{ki}\]

and \(c_{jki}\) using update constant \(r_{i+1}\):

\[\begin{split}\begin{array}{l} r_{i+1} = \frac{1}{i+1} \\ \overline{x}_{k,i+1} = \overline{x}_{ki} + \left(x_{k,i+1} - \overline{x}_{ki}\right)r_{i+1} \\ c_{jk,i+1} = c_{jki} + \left(x_{j,i+1} - \overline{x}_{ji}\right)\left(x_{k,i+1} - \overline{x}_{ki}\right)\left(1 - r_{i+1}\right) \end{array}\end{split}\]

Usage Notes

The function covariances uses the following definition of a sample mean:

\[\overline{x}_k = \frac{\sum_{i=1}^{n} x_{ki}}{n}\]

where n is the number of observations. The following formula defines the sample covariance, \(s_{j k}\), between variables j and k:

\[s_{jk} = \frac {\sum_{i=1}^{n} \left(x_{ji}-\overline{x}_j\right)\left(x_{ki}-\overline{x}_k\right)} {n-1}\]

The sample correlation between variables j and k, \(r_{jk}\), is defined as follows:

\[r_{jk} = \frac{s_{jk}}{\sqrt{s_{jj}s_{kk}}}\]

Examples

Example 1

The first example illustrates the use of covariances for the first 50 observations in the Fisher iris data (Fisher 1936). Note in this example that the first variable is constant over the first 50 observations.

from numpy import *
from pyimsl.math.covariances import covariances
from pyimsl.math.writeMatrix import writeMatrix

n_variables = 5
means = []
x = [[1.0, 5.1, 3.5, 1.4, .2], [1.0, 4.9, 3.0, 1.4, .2],
     [1.0, 4.7, 3.2, 1.3, .2], [1.0, 4.6, 3.1, 1.5, .2],
     [1.0, 5.0, 3.6, 1.4, .2], [1.0, 5.4, 3.9, 1.7, .4],
     [1.0, 4.6, 3.4, 1.4, .3], [1.0, 5.0, 3.4, 1.5, .2],
     [1.0, 4.4, 2.9, 1.4, .2], [1.0, 4.9, 3.1, 1.5, .1],
     [1.0, 5.4, 3.7, 1.5, .2], [1.0, 4.8, 3.4, 1.6, .2],
     [1.0, 4.8, 3.0, 1.4, .1], [1.0, 4.3, 3.0, 1.1, .1],
     [1.0, 5.8, 4.0, 1.2, .2], [1.0, 5.7, 4.4, 1.5, .4],
     [1.0, 5.4, 3.9, 1.3, .4], [1.0, 5.1, 3.5, 1.4, .3],
     [1.0, 5.7, 3.8, 1.7, .3], [1.0, 5.1, 3.8, 1.5, .3],
     [1.0, 5.4, 3.4, 1.7, .2], [1.0, 5.1, 3.7, 1.5, .4],
     [1.0, 4.6, 3.6, 1.0, .2], [1.0, 5.1, 3.3, 1.7, .5],
     [1.0, 4.8, 3.4, 1.9, .2], [1.0, 5.0, 3.0, 1.6, .2],
     [1.0, 5.0, 3.4, 1.6, .4], [1.0, 5.2, 3.5, 1.5, .2],
     [1.0, 5.2, 3.4, 1.4, .2], [1.0, 4.7, 3.2, 1.6, .2],
     [1.0, 4.8, 3.1, 1.6, .2], [1.0, 5.4, 3.4, 1.5, .4],
     [1.0, 5.2, 4.1, 1.5, .1], [1.0, 5.5, 4.2, 1.4, .2],
     [1.0, 4.9, 3.1, 1.5, .2], [1.0, 5.0, 3.2, 1.2, .2],
     [1.0, 5.5, 3.5, 1.3, .2], [1.0, 4.9, 3.6, 1.4, .1],
     [1.0, 4.4, 3.0, 1.3, .2], [1.0, 5.1, 3.4, 1.5, .2],
     [1.0, 5.0, 3.5, 1.3, .3], [1.0, 4.5, 2.3, 1.3, .3],
     [1.0, 4.4, 3.2, 1.3, .2], [1.0, 5.0, 3.5, 1.6, .6],
     [1.0, 5.1, 3.8, 1.9, .4], [1.0, 4.8, 3.0, 1.4, .3],
     [1.0, 5.1, 3.8, 1.6, .2], [1.0, 4.6, 3.2, 1.4, .2],
     [1.0, 5.3, 3.7, 1.5, .2], [1.0, 5.0, 3.3, 1.4, .2]]

# Perform analysis
correlations = covariances(x)
title = "The default case: variances/covariances"
writeMatrix(title, correlations, printUpper=True)

Output

 
              The default case: variances/covariances
             1            2            3            4            5
1       0.0000       0.0000       0.0000       0.0000       0.0000
2                    0.1242       0.0992       0.0164       0.0103
3                                 0.1437       0.0117       0.0093
4                                              0.0302       0.0061
5                                                           0.0111

Example 2

This example illustrates the use of some optional arguments in covariances. Once again, the first 50 observations in the Fisher iris data are used.

from numpy import *
from pyimsl.math.covariances import covariances
from pyimsl.math.writeMatrix import writeMatrix

n_variables = 5
means = []
x = [[1.0, 5.1, 3.5, 1.4, .2], [1.0, 4.9, 3.0, 1.4, .2],
     [1.0, 4.7, 3.2, 1.3, .2], [1.0, 4.6, 3.1, 1.5, .2],
     [1.0, 5.0, 3.6, 1.4, .2], [1.0, 5.4, 3.9, 1.7, .4],
     [1.0, 4.6, 3.4, 1.4, .3], [1.0, 5.0, 3.4, 1.5, .2],
     [1.0, 4.4, 2.9, 1.4, .2], [1.0, 4.9, 3.1, 1.5, .1],
     [1.0, 5.4, 3.7, 1.5, .2], [1.0, 4.8, 3.4, 1.6, .2],
     [1.0, 4.8, 3.0, 1.4, .1], [1.0, 4.3, 3.0, 1.1, .1],
     [1.0, 5.8, 4.0, 1.2, .2], [1.0, 5.7, 4.4, 1.5, .4],
     [1.0, 5.4, 3.9, 1.3, .4], [1.0, 5.1, 3.5, 1.4, .3],
     [1.0, 5.7, 3.8, 1.7, .3], [1.0, 5.1, 3.8, 1.5, .3],
     [1.0, 5.4, 3.4, 1.7, .2], [1.0, 5.1, 3.7, 1.5, .4],
     [1.0, 4.6, 3.6, 1.0, .2], [1.0, 5.1, 3.3, 1.7, .5],
     [1.0, 4.8, 3.4, 1.9, .2], [1.0, 5.0, 3.0, 1.6, .2],
     [1.0, 5.0, 3.4, 1.6, .4], [1.0, 5.2, 3.5, 1.5, .2],
     [1.0, 5.2, 3.4, 1.4, .2], [1.0, 4.7, 3.2, 1.6, .2],
     [1.0, 4.8, 3.1, 1.6, .2], [1.0, 5.4, 3.4, 1.5, .4],
     [1.0, 5.2, 4.1, 1.5, .1], [1.0, 5.5, 4.2, 1.4, .2],
     [1.0, 4.9, 3.1, 1.5, .2], [1.0, 5.0, 3.2, 1.2, .2],
     [1.0, 5.5, 3.5, 1.3, .2], [1.0, 4.9, 3.6, 1.4, .1],
     [1.0, 4.4, 3.0, 1.3, .2], [1.0, 5.1, 3.4, 1.5, .2],
     [1.0, 5.0, 3.5, 1.3, .3], [1.0, 4.5, 2.3, 1.3, .3],
     [1.0, 4.4, 3.2, 1.3, .2], [1.0, 5.0, 3.5, 1.6, .6],
     [1.0, 5.1, 3.8, 1.9, .4], [1.0, 4.8, 3.0, 1.4, .3],
     [1.0, 5.1, 3.8, 1.6, .2], [1.0, 4.6, 3.2, 1.4, .2],
     [1.0, 5.3, 3.7, 1.5, .2], [1.0, 5.0, 3.3, 1.4, .2]]

# Perform analysis
correlations = covariances(x,
                           stdevCorrelationMatrix=True,
                           xColDim=n_variables,
                           means=means)
writeMatrix('Means', means[1:])
title = "Correlations with Standard Deviations on the Diagonal"
writeMatrix(title, correlations[1:, 1:], printUpper=True)

Output

***
*** Warning error issued from IMSL function covariances:
*** Correlations are requested but the observations on variable 1 are constant.  The pertinent correlation coefficients are set to NaN (not a number).
***
 
                       Means
          1            2            3            4
      5.006        3.428        1.462        0.246
 
Correlations with Standard Deviations on the Diagonal
             1            2            3            4
1       0.3525       0.7425       0.2672       0.2781
2                    0.3791       0.1777       0.2328
3                                 0.1737       0.3316
4                                              0.1054

Warning Errors

IMSL_CONSTANT_VARIABLE Correlations are requested, but the observations on one or more variables are constant. The corresponding correlations are set to NaN.