covariances¶
Computes the sample variance-covariance or correlation matrix.
Synopsis¶
covariances (x)
Required Arguments¶
- float
x[[]]
(Input) - Array of size
nRows
×nVariables
containing the data.
Return Value¶
If no optional arguments are used, covariances
returns an nVariables
× nVariables
array containing the sample variance-covariance matrix of
the observations. The rows and columns of this array correspond to the
columns of x
.
Optional Arguments¶
missingValueMethod
(Input)- Method used to exclude missing values in
x
from the computations, where NaN is interpreted as the missing value code. See function machine (Chapter 15, Utilities). The methods are as follows:
missingValueMethod |
Action |
---|---|
0 | The exclusion is listwise. (The entire row
of x is excluded if any of the values of
the row is equal to the missing value code.) |
1 | Raw crossproducts are computed from all valid pairs and means, and variances are computed from all valid data on the individual variables. Corrected crossproducts, covariances, and correlations are computed using these quantities. |
2 | Raw crossproducts, means, and variances are
computed as in the case of
missingValueMethod = 1. However,
corrected crossproducts and covariances are
computed only from the valid pairs of data.
Correlations are computed using these
covariances and the variances from all valid
data. |
3 | Raw crossproducts, means, variances, and
covariances are computed as in the case of
missingValueMethod = 2. Correlations are
computed using these covariances, but the
variances used are computed from the valid
pairs of data. |
incidenceMatrix
(Output)- An array containing the incidence matrix. If
missingValueMethod
is 0,incidenceMatrix
is 1 × 1 and contains the number of valid observations; otherwise,incidenceMatrix
isnVariables
×nVariables
and contains the number of pairs of valid observations used in calculating the crossproducts for covariance. nObservations
(Output)- Sum of the frequencies. If
missingValueMethod
is 0, observations with missing values are not included innObservations
; otherwise, all observations are included except for observations with missing values for the weight or the frequency.
varianceCovarianceMatrix
or
correctedSscpMatrix
or
correlationMatrix
or
stdevCorrelationMatrix
- Exactly one of these options can be used to specify the type of matrix to be computed.
Keyword | Type of Matrix |
---|---|
varianceCovarianceMatrix |
variance-covariance matrix (default) |
correctedSscpMatrix |
corrected sums of squares and crossproducts matrix |
correlationMatrix |
correlation matrix |
stdevCorrelationMatrix |
correlation matrix except for the diagonal elements which are the standard deviations |
means
(Output)- The array containing the means of the variables in
x
. The components of the array correspond to the columns ofx
. frequencies
, float[]
(Input)Array of length
nObservations
containing the frequency for each observation.Default:
frequencies
[ ] = 1weights
, float[]
(Input)Array of length
nObservations
containing the weight for each observation.Default:
weights
[ ] = 1sumWeights
(Output)- Sum of the weights of all observations. If
missingValueMethod
is equal to 0, observations with missing values are not included insumWeights
. Otherwise, all observations are included except for observations with missing values for the weight or the frequency. nRowsMissing
(Output)- Total number of observations that contain any missing values (NaN).
Description¶
Function covariances
computes estimates of correlations, covariances, or
sums of squares and crossproducts for a data matrix x. Weights and
frequencies are allowed but not required.
The means, (corrected) sums of squares, and (corrected) sums of crossproducts are computed using the method of provisional means. Let \(x_{ki}\) denote the mean based on i observations for the k-th variable, \(f_i\) denote the frequency of the i-th observation, \(w_i\) denote the weight of the i-th observations, and \(c_{jki}\) denote the sum of crossproducts (or sum of squares if j = k) based on i observations. Then the method of provisional means finds new means and sums of crossproducts as shown in the example below.
The means and crossproducts are initialized as follows:
where p denotes the number of variables. Letting \(x_{k,i+1}\) denote the k-th variable of observation i + 1, each new observation leads to the following updates for \(x_{ki}\) and \(c_{jki}\) using the update constant \(r_{i+1}\):
The default value for weights and frequencies is 1. Means and variances are computed based on the valid data for each variable or, if required, based on all the valid data for each pair of variables.
Usage Notes¶
Function covariances
defines a sample mean by
where n is the number of observations.
The following formula defines the sample covariance, \(s_{jk}\), between variables j and k:
The sample correlation between variables j and k, \(r_{jk}\), is defined as follows:
Examples¶
Example 1¶
This example illustrates the use of covariances
for the first 50
observations in the Fisher iris data (Fisher 1936). Note that the first
variable is constant over the first 50 observations.
from numpy import *
from pyimsl.stat.covariances import covariances
from pyimsl.stat.writeMatrix import writeMatrix
x = [[1.0, 5.1, 3.5, 1.4, .2], [1.0, 4.9, 3.0, 1.4, .2],
[1.0, 4.7, 3.2, 1.3, .2], [1.0, 4.6, 3.1, 1.5, .2],
[1.0, 5.0, 3.6, 1.4, .2], [1.0, 5.4, 3.9, 1.7, .4],
[1.0, 4.6, 3.4, 1.4, .3], [1.0, 5.0, 3.4, 1.5, .2],
[1.0, 4.4, 2.9, 1.4, .2], [1.0, 4.9, 3.1, 1.5, .1],
[1.0, 5.4, 3.7, 1.5, .2], [1.0, 4.8, 3.4, 1.6, .2],
[1.0, 4.8, 3.0, 1.4, .1], [1.0, 4.3, 3.0, 1.1, .1],
[1.0, 5.8, 4.0, 1.2, .2], [1.0, 5.7, 4.4, 1.5, .4],
[1.0, 5.4, 3.9, 1.3, .4], [1.0, 5.1, 3.5, 1.4, .3],
[1.0, 5.7, 3.8, 1.7, .3], [1.0, 5.1, 3.8, 1.5, .3],
[1.0, 5.4, 3.4, 1.7, .2], [1.0, 5.1, 3.7, 1.5, .4],
[1.0, 4.6, 3.6, 1.0, .2], [1.0, 5.1, 3.3, 1.7, .5],
[1.0, 4.8, 3.4, 1.9, .2], [1.0, 5.0, 3.0, 1.6, .2],
[1.0, 5.0, 3.4, 1.6, .4], [1.0, 5.2, 3.5, 1.5, .2],
[1.0, 5.2, 3.4, 1.4, .2], [1.0, 4.7, 3.2, 1.6, .2],
[1.0, 4.8, 3.1, 1.6, .2], [1.0, 5.4, 3.4, 1.5, .4],
[1.0, 5.2, 4.1, 1.5, .1], [1.0, 5.5, 4.2, 1.4, .2],
[1.0, 4.9, 3.1, 1.5, .2], [1.0, 5.0, 3.2, 1.2, .2],
[1.0, 5.5, 3.5, 1.3, .2], [1.0, 4.9, 3.6, 1.4, .1],
[1.0, 4.4, 3.0, 1.3, .2], [1.0, 5.1, 3.4, 1.5, .2],
[1.0, 5.0, 3.5, 1.3, .3], [1.0, 4.5, 2.3, 1.3, .3],
[1.0, 4.4, 3.2, 1.3, .2], [1.0, 5.0, 3.5, 1.6, .6],
[1.0, 5.1, 3.8, 1.9, .4], [1.0, 4.8, 3.0, 1.4, .3],
[1.0, 5.1, 3.8, 1.6, .2], [1.0, 4.6, 3.2, 1.4, .2],
[1.0, 5.3, 3.7, 1.5, .2], [1.0, 5.0, 3.3, 1.4, .2]]
# Perform analysis
correlations = covariances(x)
# Print results
writeMatrix("The default case: variances/covariances",
correlations, printUpper=True)
Output¶
The default case: variances/covariances
1 2 3 4 5
1 0.0000 0.0000 0.0000 0.0000 0.0000
2 0.1242 0.0992 0.0164 0.0103
3 0.1437 0.0117 0.0093
4 0.0302 0.0061
5 0.0111
Example 2¶
This example, which uses the first 50 observations in the Fisher iris data, illustrates the use of optional arguments.
from numpy import *
from pyimsl.stat.covariances import covariances
from pyimsl.stat.writeMatrix import writeMatrix
n_variables = 5
means = []
x = [[1.0, 5.1, 3.5, 1.4, .2], [1.0, 4.9, 3.0, 1.4, .2],
[1.0, 4.7, 3.2, 1.3, .2], [1.0, 4.6, 3.1, 1.5, .2],
[1.0, 5.0, 3.6, 1.4, .2], [1.0, 5.4, 3.9, 1.7, .4],
[1.0, 4.6, 3.4, 1.4, .3], [1.0, 5.0, 3.4, 1.5, .2],
[1.0, 4.4, 2.9, 1.4, .2], [1.0, 4.9, 3.1, 1.5, .1],
[1.0, 5.4, 3.7, 1.5, .2], [1.0, 4.8, 3.4, 1.6, .2],
[1.0, 4.8, 3.0, 1.4, .1], [1.0, 4.3, 3.0, 1.1, .1],
[1.0, 5.8, 4.0, 1.2, .2], [1.0, 5.7, 4.4, 1.5, .4],
[1.0, 5.4, 3.9, 1.3, .4], [1.0, 5.1, 3.5, 1.4, .3],
[1.0, 5.7, 3.8, 1.7, .3], [1.0, 5.1, 3.8, 1.5, .3],
[1.0, 5.4, 3.4, 1.7, .2], [1.0, 5.1, 3.7, 1.5, .4],
[1.0, 4.6, 3.6, 1.0, .2], [1.0, 5.1, 3.3, 1.7, .5],
[1.0, 4.8, 3.4, 1.9, .2], [1.0, 5.0, 3.0, 1.6, .2],
[1.0, 5.0, 3.4, 1.6, .4], [1.0, 5.2, 3.5, 1.5, .2],
[1.0, 5.2, 3.4, 1.4, .2], [1.0, 4.7, 3.2, 1.6, .2],
[1.0, 4.8, 3.1, 1.6, .2], [1.0, 5.4, 3.4, 1.5, .4],
[1.0, 5.2, 4.1, 1.5, .1], [1.0, 5.5, 4.2, 1.4, .2],
[1.0, 4.9, 3.1, 1.5, .2], [1.0, 5.0, 3.2, 1.2, .2],
[1.0, 5.5, 3.5, 1.3, .2], [1.0, 4.9, 3.6, 1.4, .1],
[1.0, 4.4, 3.0, 1.3, .2], [1.0, 5.1, 3.4, 1.5, .2],
[1.0, 5.0, 3.5, 1.3, .3], [1.0, 4.5, 2.3, 1.3, .3],
[1.0, 4.4, 3.2, 1.3, .2], [1.0, 5.0, 3.5, 1.6, .6],
[1.0, 5.1, 3.8, 1.9, .4], [1.0, 4.8, 3.0, 1.4, .3],
[1.0, 5.1, 3.8, 1.6, .2], [1.0, 4.6, 3.2, 1.4, .2],
[1.0, 5.3, 3.7, 1.5, .2], [1.0, 5.0, 3.3, 1.4, .2]]
# Perform analysis
correlations = covariances(x,
stdevCorrelationMatrix=True,
means=means)
# Print results
writeMatrix("Means\n", means[1:])
title = "Correlations with Standard Deviations on the Diagonal\n"
tmpCorrelations = correlations[1:, 1:]
writeMatrix(title, tmpCorrelations, printUpper=True)
Output¶
***
*** Warning error issued from IMSL function covariances:
*** Correlations are requested but the observations on variable 1 are constant. The pertinent correlation coefficients are set to NaN (not a number).
***
Means
1 2 3 4
5.006 3.428 1.462 0.246
Correlations with Standard Deviations on the Diagonal
1 2 3 4
1 0.3525 0.7425 0.2672 0.2781
2 0.3791 0.1777 0.2328
3 0.1737 0.3316
4 0.1054
Warning Errors¶
IMSLS_CONSTANT_VARIABLE |
Correlations are requested, but the observations on one or more variables are constant. The corresponding correlations are set to NaN. |
IMSLS_INSUFFICIENT_DATA |
Variances and covariances are requested, but fewer than two valid observations are present for a variable. The pertinent statistics are set to NaN |
IMSLS_ZERO_SUM_OF_WEIGHTS_2 |
The sum of the weights is zero. The means, variances, and covariances are set to NaN |
IMSLS_ZERO_SUM_OF_WEIGHTS_3 |
The sum of the weights is zero. The means and correlations are set to NaN |
IMSLS_TOO_FEW_VALID_OBS_CORREL |
Correlations are requested, but fewer than two valid observations are present for a variable. The pertinent correlation coefficients are set to NaN |