kolmogorovTwo

Performs a Kolmogorov-Smirnov two-sample test.

Synopsis

kolmogorovTwo (x, y)

Required Arguments

float x[] (Input)
Array of size nObservationsX containing the observations from sample one.
float y[] (Input)
Array of size nObservationsY containing the observations from sample two.

Return Value

An array of length 3 containing Z, p1, and p2.

Optional Arguments

differences (Output)
The array containing Dn, D+n, Dn.
nMissingX (Ouput)
Number of missing values in the x sample is returned in nMissingX.
nMissingY (Ouput)
Number of missing values in the y sample is returned in nMissingY.

Description

Function kolmogorovTwo computes Kolmogorov-Smirnov two-sample test statistics for testing that two continuous cumulative distribution functions (CDF’s) are identical based upon two random samples. One- or two-sided alternatives are allowed. Exact p‑values are computed for the two-sided test when nObservationsX × nObservationsY is less than 104.

Let Fn(x) denote the empirical CDF in the X sample, let Gm(y) denote the empirical CDF in the Y sample, where n = nObservationsX - nMissingX and m = nObservationsY - nMissingY, and let the corresponding population distribution functions be denoted by F(x) and G(y), respectively. Then, the hypotheses tested by kolmogorovTwo are as follows:

H0:F(x)=G(x).....H1:F(x)G(x)H0:F(x)G(x).....H1:F(x)>G(x)H0:F(x)G(x).....H1:F(x)<G(x)

The test statistics are given as follows:

Dmn=max(D+mn,Dmn)...(differences[0])D+mn=maxx(Fn(x)Gm(x))...(differences[1])Dmn=maxx(Gm(x)Fn(x))...(differences[1])

Asymptotically, the distribution of the statistic

Z=Dmn(mn)/(m+n)

(returned in testStatistics[0]) converges to a distribution given by Smirnov (1939).

Exact probabilities for the two-sided test are computed when n*m is less than or equal to 104, according to an algorithm given by Kim and Jennrich (1973;). When n*m is greater than 104, the very good approximations given by Kim and Jennrich are used to obtain the two-sided p-values. The one-sided probability is taken as one half the two-sided probability. This is a very good approximation when the p-value is small (say, less than 0.10) and not very good for large p‑values.

Example

This example illustrates the kolmogorovTwo routine with two randomly generated samples from a uniform(0,1) distribution. Since the two theoretical distributions are identical, we would not expect to reject the null hypothesis.

from __future__ import print_function
from numpy import *
from pyimsl.stat.kolmogorovTwo import kolmogorovTwo
from pyimsl.stat.randomSeedSet import randomSeedSet
from pyimsl.stat.randomUniform import randomUniform

nobsx = 100
nobsy = 60
randomSeedSet(123457)
x = randomUniform(nobsx)
y = randomUniform(nobsy)
nMissingX = []
nMissingY = []
differences = []

statistics = kolmogorovTwo(x, y,
                           nMissingX=nMissingX, nMissingY=nMissingY,
                           differences=differences)

print("D      = %8.4f" % (differences[0]))
print("D+     = %8.4f" % (differences[1]))
print("D-     = %8.4f" % (differences[2]))
print("Z      = %8.4f" % (statistics[0]))
print("Prob greater D one sided  = %8.4f" % (statistics[1]))
print("Prob greater D two sided  = %8.4f" % (statistics[2]))
print("Missing X = %d" % (nMissingX[0]))
print("Missing Y = %d" % (nMissingY[0]))

Output

D      =   0.1800
D+     =   0.1800
D-     =   0.0100
Z      =   1.1023
Prob greater D one sided  =   0.0720
Prob greater D two sided  =   0.1440
Missing X = 0
Missing Y = 0