covariances¶
Computes the sample variance-covariance or correlation matrix.
Synopsis¶
covariances (x)
Required Arguments¶
- float
x[[]]
(Input) - Array of size
nObservations
×nVariables
containing the matrix of data.
Return Value¶
If no optional arguments are used, covariances
returns an nVariables
× nVariables
matrix containing the sample variance-covariance matrix of
the observations. The rows and columns of this matrix correspond to the
columns of x
.
Optional Arguments¶
varianceCovarianceMatrix
, or
correctedSscpMatrix
, or
correlationMatrix
, or
stdevCorrelationMatrix
- Exactly one of these options can be used to specify the type of matrix to be computed.
Keyword | Type of Matrix |
---|---|
varianceCovarianceMatrix |
variance-covariance matrix (default) |
correctedSscpMatrix |
corrected sums of squares and crossproducts matrix |
correlationMatrix |
correlation matrix |
stdevCorrelationMatrix |
correlation matrix except for the diagonal elements which are the standard deviations |
means
(Output)- The array containing the means of the variables in
x
. The components of the array correspond to the columns ofx
.
Description¶
The function covariances
computes estimates of correlations,
covariances, or sums of squares and crossproducts for a data matrix x
.
The means, (corrected) sums of squares, and (corrected) sums of
crossproducts are computed using the method of provisional means. Let
denote the mean based on i observations for the k-th variable, and let \(c_{jki}\) denote the sum of crossproducts (or sum of squares if j = k) based on i observations. Then, the method of provisional means finds new means and sums of crossproducts as follows:
The means and crossproducts are initialized as:
where p denotes the number of variables. Letting \(x_{k,i+1}\) denote the k-th variable on observation i + 1, each new observation leads to the following updates for
and \(c_{jki}\) using update constant \(r_{i+1}\):
Usage Notes¶
The function covariances
uses the following definition of a sample mean:
where n is the number of observations. The following formula defines the sample covariance, \(s_{j k}\), between variables j and k:
The sample correlation between variables j and k, \(r_{jk}\), is defined as follows:
Examples¶
Example 1¶
The first example illustrates the use of covariances
for the first 50
observations in the Fisher iris data (Fisher 1936). Note in this example
that the first variable is constant over the first 50 observations.
from numpy import *
from pyimsl.math.covariances import covariances
from pyimsl.math.writeMatrix import writeMatrix
n_variables = 5
means = []
x = [[1.0, 5.1, 3.5, 1.4, .2], [1.0, 4.9, 3.0, 1.4, .2],
[1.0, 4.7, 3.2, 1.3, .2], [1.0, 4.6, 3.1, 1.5, .2],
[1.0, 5.0, 3.6, 1.4, .2], [1.0, 5.4, 3.9, 1.7, .4],
[1.0, 4.6, 3.4, 1.4, .3], [1.0, 5.0, 3.4, 1.5, .2],
[1.0, 4.4, 2.9, 1.4, .2], [1.0, 4.9, 3.1, 1.5, .1],
[1.0, 5.4, 3.7, 1.5, .2], [1.0, 4.8, 3.4, 1.6, .2],
[1.0, 4.8, 3.0, 1.4, .1], [1.0, 4.3, 3.0, 1.1, .1],
[1.0, 5.8, 4.0, 1.2, .2], [1.0, 5.7, 4.4, 1.5, .4],
[1.0, 5.4, 3.9, 1.3, .4], [1.0, 5.1, 3.5, 1.4, .3],
[1.0, 5.7, 3.8, 1.7, .3], [1.0, 5.1, 3.8, 1.5, .3],
[1.0, 5.4, 3.4, 1.7, .2], [1.0, 5.1, 3.7, 1.5, .4],
[1.0, 4.6, 3.6, 1.0, .2], [1.0, 5.1, 3.3, 1.7, .5],
[1.0, 4.8, 3.4, 1.9, .2], [1.0, 5.0, 3.0, 1.6, .2],
[1.0, 5.0, 3.4, 1.6, .4], [1.0, 5.2, 3.5, 1.5, .2],
[1.0, 5.2, 3.4, 1.4, .2], [1.0, 4.7, 3.2, 1.6, .2],
[1.0, 4.8, 3.1, 1.6, .2], [1.0, 5.4, 3.4, 1.5, .4],
[1.0, 5.2, 4.1, 1.5, .1], [1.0, 5.5, 4.2, 1.4, .2],
[1.0, 4.9, 3.1, 1.5, .2], [1.0, 5.0, 3.2, 1.2, .2],
[1.0, 5.5, 3.5, 1.3, .2], [1.0, 4.9, 3.6, 1.4, .1],
[1.0, 4.4, 3.0, 1.3, .2], [1.0, 5.1, 3.4, 1.5, .2],
[1.0, 5.0, 3.5, 1.3, .3], [1.0, 4.5, 2.3, 1.3, .3],
[1.0, 4.4, 3.2, 1.3, .2], [1.0, 5.0, 3.5, 1.6, .6],
[1.0, 5.1, 3.8, 1.9, .4], [1.0, 4.8, 3.0, 1.4, .3],
[1.0, 5.1, 3.8, 1.6, .2], [1.0, 4.6, 3.2, 1.4, .2],
[1.0, 5.3, 3.7, 1.5, .2], [1.0, 5.0, 3.3, 1.4, .2]]
# Perform analysis
correlations = covariances(x)
title = "The default case: variances/covariances"
writeMatrix(title, correlations, printUpper=True)
Output¶
The default case: variances/covariances
1 2 3 4 5
1 0.0000 0.0000 0.0000 0.0000 0.0000
2 0.1242 0.0992 0.0164 0.0103
3 0.1437 0.0117 0.0093
4 0.0302 0.0061
5 0.0111
Example 2¶
This example illustrates the use of some optional arguments in
covariances
. Once again, the first 50 observations in the Fisher iris
data are used.
from numpy import *
from pyimsl.math.covariances import covariances
from pyimsl.math.writeMatrix import writeMatrix
n_variables = 5
means = []
x = [[1.0, 5.1, 3.5, 1.4, .2], [1.0, 4.9, 3.0, 1.4, .2],
[1.0, 4.7, 3.2, 1.3, .2], [1.0, 4.6, 3.1, 1.5, .2],
[1.0, 5.0, 3.6, 1.4, .2], [1.0, 5.4, 3.9, 1.7, .4],
[1.0, 4.6, 3.4, 1.4, .3], [1.0, 5.0, 3.4, 1.5, .2],
[1.0, 4.4, 2.9, 1.4, .2], [1.0, 4.9, 3.1, 1.5, .1],
[1.0, 5.4, 3.7, 1.5, .2], [1.0, 4.8, 3.4, 1.6, .2],
[1.0, 4.8, 3.0, 1.4, .1], [1.0, 4.3, 3.0, 1.1, .1],
[1.0, 5.8, 4.0, 1.2, .2], [1.0, 5.7, 4.4, 1.5, .4],
[1.0, 5.4, 3.9, 1.3, .4], [1.0, 5.1, 3.5, 1.4, .3],
[1.0, 5.7, 3.8, 1.7, .3], [1.0, 5.1, 3.8, 1.5, .3],
[1.0, 5.4, 3.4, 1.7, .2], [1.0, 5.1, 3.7, 1.5, .4],
[1.0, 4.6, 3.6, 1.0, .2], [1.0, 5.1, 3.3, 1.7, .5],
[1.0, 4.8, 3.4, 1.9, .2], [1.0, 5.0, 3.0, 1.6, .2],
[1.0, 5.0, 3.4, 1.6, .4], [1.0, 5.2, 3.5, 1.5, .2],
[1.0, 5.2, 3.4, 1.4, .2], [1.0, 4.7, 3.2, 1.6, .2],
[1.0, 4.8, 3.1, 1.6, .2], [1.0, 5.4, 3.4, 1.5, .4],
[1.0, 5.2, 4.1, 1.5, .1], [1.0, 5.5, 4.2, 1.4, .2],
[1.0, 4.9, 3.1, 1.5, .2], [1.0, 5.0, 3.2, 1.2, .2],
[1.0, 5.5, 3.5, 1.3, .2], [1.0, 4.9, 3.6, 1.4, .1],
[1.0, 4.4, 3.0, 1.3, .2], [1.0, 5.1, 3.4, 1.5, .2],
[1.0, 5.0, 3.5, 1.3, .3], [1.0, 4.5, 2.3, 1.3, .3],
[1.0, 4.4, 3.2, 1.3, .2], [1.0, 5.0, 3.5, 1.6, .6],
[1.0, 5.1, 3.8, 1.9, .4], [1.0, 4.8, 3.0, 1.4, .3],
[1.0, 5.1, 3.8, 1.6, .2], [1.0, 4.6, 3.2, 1.4, .2],
[1.0, 5.3, 3.7, 1.5, .2], [1.0, 5.0, 3.3, 1.4, .2]]
# Perform analysis
correlations = covariances(x,
stdevCorrelationMatrix=True,
xColDim=n_variables,
means=means)
writeMatrix('Means', means[1:])
title = "Correlations with Standard Deviations on the Diagonal"
writeMatrix(title, correlations[1:, 1:], printUpper=True)
Output¶
***
*** Warning error issued from IMSL function covariances:
*** Correlations are requested but the observations on variable 1 are constant. The pertinent correlation coefficients are set to NaN (not a number).
***
Means
1 2 3 4
5.006 3.428 1.462 0.246
Correlations with Standard Deviations on the Diagonal
1 2 3 4
1 0.3525 0.7425 0.2672 0.2781
2 0.3791 0.1777 0.2328
3 0.1737 0.3316
4 0.1054
Warning Errors¶
IMSL_CONSTANT_VARIABLE |
Correlations are requested, but the observations on one or more variables are constant. The corresponding correlations are set to NaN. |