covariances


   more...

Computes the sample variance-covariance or correlation matrix.

Synopsis

#include <imsl.h>

float *imsl_f_covariances (int n_observations, int n_variables, float x[], , 0)

The type double function is imsl_d_covariances.

Required Arguments

int n_observations (Input)
The number of observations.

int n_variables (Input)
The number of variables.

float x[] (Input)
Array of size n_observations × n_variables containing the matrix of data.

Return Value

If no optional arguments are used, imsl_f_covariances returns a pointer to an n_variables × n_variables matrix containing the sample variance-covariance matrix of the observations. The rows and columns of this matrix correspond to the columns of x.

Synopsis with Optional Arguments

#include <imsl.h>

float *imsl_f_covariances (int n_observations, int n_variables, float x[],

IMSL_X_COL_DIM, int x_col_dim,

IMSL_VARIANCE_COVARIANCE_MATRIX,

IMSL_CORRECTED_SSCP_MATRIX,

IMSL_CORRELATION_MATRIX,

IMSL_STDEV_CORRELATION_MATRIX,

IMSL_MEANS, float **p_means,

IMSL_MEANS_USER, float means[],

IMSL_COVARIANCE_COL_DIM, int covariance_col_dim,

IMSL_RETURN_USER, float covariance[],

0)

Optional Arguments

IMSL_X_COL_DIM, int x_col_dim (Input)
The column dimension of array x.
Default: x_col_dim = n_variables

IMSL_VARIANCE_COVARIANCE_MATRIX, or

IMSL_CORRECTED_SSCP_MATRIX, or

IMSL_CORRELATION_MATRIX, or

IMSL_STDEV_CORRELATION_MATRIX
Exactly one of these options can be used to specify the type of matrix to be computed.

Keyword

Type of Matrix

IMSL_VARIANCE_COVARIANCE_MATRIX

variance-covariance matrix (default)

IMSL_CORRECTED_SSCP_MATRIX

corrected sums of squares and crossproducts matrix

IMSL_CORRELATION_MATRIX

correlation matrix

IMSL_STDEV_CORRELATION_MATRIX

correlation matrix except for the diagonal elements which are the standard deviations

IMSL_MEANS, float **p_means (Output)
The address of a pointer to the array containing the means of the variables in x. The components of the array correspond to the columns of x. On return, the pointer is initialized (through a memory allocation request to malloc), and the array is stored there. Typically, float *p_means is declared; &p_means is used as an argument to this function; and imsl_free(p_means) is used to free this array.

IMSL_MEANS_USER, float means[] (Output)
Calculate the n_variables means and store them in the memory provided by the user. The elements of means correspond to the columns of x.

IMSL_COVARIANCE_COL_DIM, int covariance_col_dim (Input)
The column dimension of array covariance, if IMSL_RETURN_USER is specified, or the column dimension of the return value otherwise.
Default: covariance_col_dim = n_variables

IMSL_RETURN_USER, float covariance[] (Output)
If specified, the output is stored in the array covariance of size n_variables × n_variables provided by the user.

Description

The function imsl_f_covariances computes estimates of correlations, covariances, or sums of squares and crossproducts for a data matrix x. The means, (corrected) sums of squares, and (corrected) sums of crossproducts are computed using the method of provisional means. Let

 

denote the mean based on i observations for the k-th variable, and let cjki denote the sum of crossproducts (or sum of squares if j = k) based on i observations. Then, the method of provisional means finds new means and sums of crossproducts as follows:

The means and crossproducts are initialized as:

 

where p denotes the number of variables. Letting xk,i+1 denote the k-th variable on observation i + 1, each new observation leads to the following updates for

 

and cjki using update constant ri+1:

 

Usage Notes

The function imsl_f_covariances uses the following definition of a sample mean:

 

where n is the number of observations. The following formula defines the sample covariance, sj k, between variables j and k:

 

The sample correlation between variables j and k, rjk, is defined as follows:

 

 

Examples

 

Example 1

The first example illustrates the use of imsl_f_covariances for the first 50 observations in the Fisher iris data (Fisher 1936). Note in this example that the first variable is constant over the first 50 observations.

 

#include <imsl.h>

 

#define N_VARIABLES 5

#define N_OBSERVATIONS 50

 

int main()

{

float *covariances;

float x[] = {1.0, 5.1, 3.5, 1.4, .2, 1.0, 4.9, 3.0, 1.4, .2,

1.0, 4.7, 3.2, 1.3, .2, 1.0, 4.6, 3.1, 1.5, .2,

1.0, 5.0, 3.6, 1.4, .2, 1.0, 5.4, 3.9, 1.7, .4,

1.0, 4.6, 3.4, 1.4, .3, 1.0, 5.0, 3.4, 1.5, .2,

1.0, 4.4, 2.9, 1.4, .2, 1.0, 4.9, 3.1, 1.5, .1,

1.0, 5.4, 3.7, 1.5, .2, 1.0, 4.8, 3.4, 1.6, .2,

1.0, 4.8, 3.0, 1.4, .1, 1.0, 4.3, 3.0, 1.1, .1,

1.0, 5.8, 4.0, 1.2, .2, 1.0, 5.7, 4.4, 1.5, .4,

1.0, 5.4, 3.9, 1.3, .4, 1.0, 5.1, 3.5, 1.4, .3,

1.0, 5.7, 3.8, 1.7, .3, 1.0, 5.1, 3.8, 1.5, .3,

1.0, 5.4, 3.4, 1.7, .2, 1.0, 5.1, 3.7, 1.5, .4,

1.0, 4.6, 3.6, 1.0, .2, 1.0, 5.1, 3.3, 1.7, .5,

1.0, 4.8, 3.4, 1.9, .2, 1.0, 5.0, 3.0, 1.6, .2,

1.0, 5.0, 3.4, 1.6, .4, 1.0, 5.2, 3.5, 1.5, .2,

1.0, 5.2, 3.4, 1.4, .2, 1.0, 4.7, 3.2, 1.6, .2,

1.0, 4.8, 3.1, 1.6, .2, 1.0, 5.4, 3.4, 1.5, .4,

1.0, 5.2, 4.1, 1.5, .1, 1.0, 5.5, 4.2, 1.4, .2,

1.0, 4.9, 3.1, 1.5, .2, 1.0, 5.0, 3.2, 1.2, .2,

1.0, 5.5, 3.5, 1.3, .2, 1.0, 4.9, 3.6, 1.4, .1,

1.0, 4.4, 3.0, 1.3, .2, 1.0, 5.1, 3.4, 1.5, .2,

1.0, 5.0, 3.5, 1.3, .3, 1.0, 4.5, 2.3, 1.3, .3,

1.0, 4.4, 3.2, 1.3, .2, 1.0, 5.0, 3.5, 1.6, .6,

1.0, 5.1, 3.8, 1.9, .4, 1.0, 4.8, 3.0, 1.4, .3,

1.0, 5.1, 3.8, 1.6, .2, 1.0, 4.6, 3.2, 1.4, .2,

1.0, 5.3, 3.7, 1.5, .2, 1.0, 5.0, 3.3, 1.4, .2};

 

covariances = imsl_f_covariances (N_OBSERVATIONS, N_VARIABLES, x, 0);

imsl_f_write_matrix ("The default case: variances/covariances",

N_VARIABLES, N_VARIABLES, covariances,

IMSL_PRINT_UPPER,

0);

}

Output

 

The default case: variances/covariances

1 2 3 4 5

1 0.0000 0.0000 0.0000 0.0000 0.0000

2 0.1242 0.0992 0.0164 0.0103

3 0.1437 0.0117 0.0093

4 0.0302 0.0061

5 0.0111

Example 2

This example illustrates the use of some optional arguments in imsl_f_covariances. Once again, the first 50 observations in the Fisher iris data are used.

 

#include <imsl.h>

 

#define N_VARIABLES 5

#define N_OBSERVATIONS 50

 

int main()

{

char *title;

float *means, *correlations;

float x[] = {1.0, 5.1, 3.5, 1.4, .2, 1.0, 4.9, 3.0, 1.4, .2,

1.0, 4.7, 3.2, 1.3, .2, 1.0, 4.6, 3.1, 1.5, .2,

1.0, 5.0, 3.6, 1.4, .2, 1.0, 5.4, 3.9, 1.7, .4,

1.0, 4.6, 3.4, 1.4, .3, 1.0, 5.0, 3.4, 1.5, .2,

1.0, 4.4, 2.9, 1.4, .2, 1.0, 4.9, 3.1, 1.5, .1,

1.0, 5.4, 3.7, 1.5, .2, 1.0, 4.8, 3.4, 1.6, .2,

1.0, 4.8, 3.0, 1.4, .1, 1.0, 4.3, 3.0, 1.1, .1,

1.0, 5.8, 4.0, 1.2, .2, 1.0, 5.7, 4.4, 1.5, .4,

1.0, 5.4, 3.9, 1.3, .4, 1.0, 5.1, 3.5, 1.4, .3,

1.0, 5.7, 3.8, 1.7, .3, 1.0, 5.1, 3.8, 1.5, .3,

1.0, 5.4, 3.4, 1.7, .2, 1.0, 5.1, 3.7, 1.5, .4,

1.0, 4.6, 3.6, 1.0, .2, 1.0, 5.1, 3.3, 1.7, .5,

1.0, 4.8, 3.4, 1.9, .2, 1.0, 5.0, 3.0, 1.6, .2,

1.0, 5.0, 3.4, 1.6, .4, 1.0, 5.2, 3.5, 1.5, .2,

1.0, 5.2, 3.4, 1.4, .2, 1.0, 4.7, 3.2, 1.6, .2,

1.0, 4.8, 3.1, 1.6, .2, 1.0, 5.4, 3.4, 1.5, .4,

1.0, 5.2, 4.1, 1.5, .1, 1.0, 5.5, 4.2, 1.4, .2,

1.0, 4.9, 3.1, 1.5, .2, 1.0, 5.0, 3.2, 1.2, .2,

1.0, 5.5, 3.5, 1.3, .2, 1.0, 4.9, 3.6, 1.4, .1,

1.0, 4.4, 3.0, 1.3, .2, 1.0, 5.1, 3.4, 1.5, .2,

1.0, 5.0, 3.5, 1.3, .3, 1.0, 4.5, 2.3, 1.3, .3,

1.0, 4.4, 3.2, 1.3, .2, 1.0, 5.0, 3.5, 1.6, .6,

1.0, 5.1, 3.8, 1.9, .4, 1.0, 4.8, 3.0, 1.4, .3,

1.0, 5.1, 3.8, 1.6, .2, 1.0, 4.6, 3.2, 1.4, .2,

1.0, 5.3, 3.7, 1.5, .2, 1.0, 5.0, 3.3, 1.4, .2};

 

correlations = imsl_f_covariances (N_OBSERVATIONS,

N_VARIABLES-1, x+1,

IMSL_STDEV_CORRELATION_MATRIX,

IMSL_X_COL_DIM, N_VARIABLES,

IMSL_MEANS, &means,

0);

imsl_f_write_matrix ("Means\n", 1, N_VARIABLES-1, means, 0);

title = "Correlations with Standard Deviations on the Diagonal\n";

imsl_f_write_matrix (title, N_VARIABLES-1, N_VARIABLES-1,

correlations, IMSL_PRINT_UPPER,

0);

}

Output

 

Means

 

1 2 3 4

5.006 3.428 1.462 0.246

 

Correlations with Standard Deviations on the Diagonal

 

1 2 3 4

1 0.3525 0.7425 0.2672 0.2781

2 0.3791 0.1777 0.2328

3 0.1737 0.3316

4 0.1054

Warning Errors

IMSL_CONSTANT_VARIABLE

Correlations are requested, but the observations on one or more variables are constant. The corresponding correlations are set to NaN.