public class KolmogorovTwoSample extends Object implements Serializable
Class KolmogorovTwoSample
computes Kolmogorov-Smirnov two-sample
test statistics for testing that two continuous cumulative distribution functions (CDF's)
are identical based upon two random samples. One- or two-sided alternatives are
allowed. Exact p-values are computed for the two-sided test when
\(nm \le 104\),
where n is the number of non-missing X observations and
m the number of non-missing Y observation.
Let \(F_n(x)\) denote the empirical CDF in the X sample,
let \(G_m(y)\) denote the empirical CDF in the Y sample
and let the corresponding population
distribution functions be denoted by
\(F(x)\) and \(G(y)\), respectively.
Then, the hypotheses tested by KolmogorovTwoSample
are as follows:
$$
\begin{array}{ll}
H_0:~ F(x) = G(x) & H_1:~F(x) \ne G(x) \\
H_0:~ F(x) \ge G(x) & H_1:~F(x) \lt G(x) \\
H_0:~ F(x) \le G(x) & H_1:~F(x) \gt G(x)
\end{array}
$$
The test statistics are given as follows:
$$
\begin{array}{rl}
D_{mn} & = \max(D_{mn}^{+}, D_{mn}^{-}) \\
D_{mn}^{+} & = \max_x(F_n(x)-G_m(x)) \\
D_{mn}^{-} & = \max_x(G_m(x)-F_n(x))
\end{array}
$$
Asymptotically, the distribution of the statistic
$$
Z = D_{mn} \sqrt{\frac{mn}{m+n}}
$$
converges to a distribution given by Smirnov (1939).
Exact probabilities for the two-sided test are computed when \(nm \le 104\), according to an algorithm given by Kim and Jennrich (1973). When \(nm \gt 104\), the very good approximations given by Kim and Jennrich are used to obtain the two-sided p-values. The one-sided probability is taken as one half the two-sided probability. This is a very good approximation when the p-value is small (say, less than 0.10) and not very good for large p-values.
Constructor and Description |
---|
KolmogorovTwoSample(double[] x,
double[] y)
Constructs a two sample Kolmogorov-Smirnov goodness-of-fit test.
|
Modifier and Type | Method and Description |
---|---|
double |
getMaximumDifference()
Returns \(D^{+}\),
the maximum difference between the theoretical and empirical CDF's.
|
double |
getMinimumDifference()
Returns \(D^{-}\),
the minimum difference between the theoretical and empirical CDF's.
|
int |
getNumberMissingX()
Returns the number of missing values in the
x sample. |
int |
getNumberMissingY()
Returns the number of missing values in the
y sample. |
double |
getOneSidedPValue()
Probability of the statistic exceeding D under
the null hypothesis of equality and against the
one-sided alternative.
|
double |
getTestStatistic()
Returns \(D = \max(D^{+}, D^{-})\).
|
double |
getTwoSidedPValue()
Probability of the statistic exceeding D under
the null hypothesis of equality and against the
two-sided alternative.
|
double |
getZ()
Returns the normalized D statistic without the continuity correction applied.
|
public KolmogorovTwoSample(double[] x, double[] y)
x
- is an array containing the observations from the first sample.y
- is an array containing the observations from the second sample.public double getTestStatistic()
public double getMaximumDifference()
public double getMinimumDifference()
public double getZ()
public double getOneSidedPValue()
public double getTwoSidedPValue()
getOneSidedPValue
,
(or 1.0 if \(p_1 \ge 1/2\)).
This approximation is nearly exact when
\(p_1 \lt 0.1\).public int getNumberMissingX()
x
sample.x
.public int getNumberMissingY()
y
sample.y
.Copyright © 2020 Rogue Wave Software. All rights reserved.