kolmogorovTwo¶
Performs a Kolmogorov-Smirnov two-sample test.
Synopsis¶
kolmogorovTwo (x, y)
Required Arguments¶
- float
x[]
(Input) - Array of size
nObservationsX
containing the observations from sample one. - float
y[]
(Input) - Array of size
nObservationsY
containing the observations from sample two.
Return Value¶
An array of length 3 containing Z, \(p_1\), and \(p_2\).
Optional Arguments¶
differences
(Output)- The array containing \(D_n\), \(D_n^+\), \(D_n^-\).
nMissingX
(Ouput)- Number of missing values in the
x
sample is returned innMissingX.
nMissingY
(Ouput)- Number of missing values in the
y
sample is returned innMissingY.
Description¶
Function kolmogorovTwo
computes Kolmogorov-Smirnov two-sample test
statistics for testing that two continuous cumulative distribution functions
(CDF’s) are identical based upon two random samples. One- or two-sided
alternatives are allowed. Exact p‑values are computed for the two-sided
test when nObservationsX
× nObservationsY
is less than 104.
Let \(F_n(x)\) denote the empirical CDF in the X sample, let
\(G_m(y)\) denote the empirical CDF in the Y sample, where n =
nObservationsX
- nMissingX
and m = nObservationsY
-
nMissingY
, and let the corresponding population distribution functions be
denoted by \(F(x)\) and \(G(y)\), respectively. Then, the hypotheses
tested by kolmogorovTwo
are as follows:
The test statistics are given as follows:
Asymptotically, the distribution of the statistic
(returned in testStatistics[0]
) converges to a distribution given by
Smirnov (1939).
Exact probabilities for the two-sided test are computed when n*m is less than or equal to \(10^4\), according to an algorithm given by Kim and Jennrich (1973;). When n*m is greater than \(10^4\), the very good approximations given by Kim and Jennrich are used to obtain the two-sided p-values. The one-sided probability is taken as one half the two-sided probability. This is a very good approximation when the p-value is small (say, less than 0.10) and not very good for large p‑values.
Example¶
This example illustrates the kolmogorovTwo
routine with two randomly
generated samples from a uniform(0,1) distribution. Since the two
theoretical distributions are identical, we would not expect to reject the
null hypothesis.
from __future__ import print_function
from numpy import *
from pyimsl.stat.kolmogorovTwo import kolmogorovTwo
from pyimsl.stat.randomSeedSet import randomSeedSet
from pyimsl.stat.randomUniform import randomUniform
nobsx = 100
nobsy = 60
randomSeedSet(123457)
x = randomUniform(nobsx)
y = randomUniform(nobsy)
nMissingX = []
nMissingY = []
differences = []
statistics = kolmogorovTwo(x, y,
nMissingX=nMissingX, nMissingY=nMissingY,
differences=differences)
print("D = %8.4f" % (differences[0]))
print("D+ = %8.4f" % (differences[1]))
print("D- = %8.4f" % (differences[2]))
print("Z = %8.4f" % (statistics[0]))
print("Prob greater D one sided = %8.4f" % (statistics[1]))
print("Prob greater D two sided = %8.4f" % (statistics[2]))
print("Missing X = %d" % (nMissingX[0]))
print("Missing Y = %d" % (nMissingY[0]))
Output¶
D = 0.1800
D+ = 0.1800
D- = 0.0100
Z = 1.1023
Prob greater D one sided = 0.0720
Prob greater D two sided = 0.1440
Missing X = 0
Missing Y = 0