kolmogorov_two

Performs a Kolmogorov-Smirnov two-sample test.

Synopsis

#include <imsls.h>

float *imsls_f_kolmogorov_two (int n_observations_x, float x[], int n_observations_y, float y[], ..., 0)

The type double function is imsls_d_kolmogorov_two.

Required Arguments

int n_observations_x (Input)
Number of observations in sample one.

float x[] (Input)
Array of size n_observations_x containing the observations from sample one.

int n_observations_y (Input)
Number of observations in sample two.

float y[] (Input)
Array of size n_observations_y containing the observations from sample two.

Return Value

Pointer to an array of length 3 containing Z, p1, and p2.

Synopsis with Optional Arguments

#include <imsls.h>

float *imsls_f_kolmogorov_two (int n_observations_x, float x[], int n_observations_y, float y[],

IMSLS_DIFFERENCES, float **differences,

IMSLS_DIFFERENCES_USER, float differences[],

IMSLS_N_MISSING_X, int *xmissing,

IMSLS_N_MISSING_Y, int *ymissing,

IMSLS_RETURN_USER, float test_statistic[],

0)

Optional Arguments

IMSLS_DIFFERENCES, float **differences (Output)
Address of a pointer to the internally allocated array containing Dn , Dn+, Dn-.

IMSLS_DIFFERENCES_USER, float differences[] (Output)
Storage for array differences is provided by the user.

See IMSLS_DIFFERENCES.

IMSLS_N_MISSING_X, int *xmissing (Ouput)
Number of missing values in the x sample is returned in *xmissing.

IMSLS_N_MISSING_Y, int *ymissing (Ouput)
Number of missing values in the y sample is returned in *ymissing.

IMSLS_RETURN_USER, float test_statistics[] (Output)
If specified, the Z-score and the p-values for hypothesis test against both one-sided and two-sided alternatives is stored in array test_statistics provided by the user.

Description

Function imsls_f_kolmogorov_two computes Kolmogorov-Smirnov two-sample test statistics for testing that two continuous cumulative distribution functions (CDF’s) are identical based upon two random samples. One- or two-sided alternatives are allowed. Exact pvalues are computed for the two-sided test when n_observations_x × n_observations_y is less than 104.

Let Fn(x) denote the empirical CDF in the X sample, let Gm(y) denote the empirical CDF in the Y sample, where n = n_observations_x - n_missing_x and m = n_observations_y - n_missing_y, and let the corresponding population distribution functions be denoted by F(x) and G(y), respectively. Then, the hypotheses tested by imsls_f_kolmogorov_two are as follows:

 

The test statistics are given as follows:

 

Asymptotically, the distribution of the statistic

 

(returned in test_statistics[0]) converges to a distribution given by Smirnov (1939).

Exact probabilities for the two-sided test are computed when n*m is less than or equal to 104, according to an algorithm given by Kim and Jennrich (1973;). When n*m is greater than 104, the very good approximations given by Kim and Jennrich are used to obtain the two-sided p-values. The one-sided probability is taken as one half the two-sided probability. This is a very good approximation when the p-value is small (say, less than 0.10) and not very good for large pvalues.

Example

This example illustrates the imsls_f_kolmogorov_two routine with two randomly generated samples from a uniform(0,1) distribution. Since the two theoretical distributions are identical, we would not expect to reject the null hypothesis.

 

#include <imsls.h>

#include <stdio.h>

int main()

{

float *statistics=NULL, *diffs = NULL, *x=NULL, *y=NULL;

int nobsx = 100, nobsy = 60, nmissx, nmissy;

imsls_random_seed_set(123457);

x = imsls_f_random_uniform(nobsx, 0);

y = imsls_f_random_uniform(nobsy, 0);

statistics = imsls_f_kolmogorov_two(nobsx, x, nobsy, y,

IMSLS_N_MISSING_X, &nmissx,

IMSLS_N_MISSING_Y, &nmissy,

IMSLS_DIFFERENCES, &diffs,

0);

printf("D = %8.4f\n", diffs[0]);

printf("D+ = %8.4f\n", diffs[1]);

printf("D- = %8.4f\n", diffs[2]);

printf("Z = %8.4f\n", statistics[0]);

printf("Prob greater D one sided = %8.4f\n", statistics[1]);

printf("Prob greater D two sided = %8.4f\n", statistics[2]);

printf("Missing X = %d\n", nmissx);

printf("Missing Y = %d\n", nmissy);

}

 

Output

 

D = 0.1800

D+ = 0.1800

D- = 0.0100

Z = 1.1023

Prob greater D one sided = 0.0720

Prob greater D two sided = 0.1440

Missing X = 0

Missing Y = 0