Class KolmogorovTwoSample
- All Implemented Interfaces:
Serializable
Class KolmogorovTwoSample computes Kolmogorov-Smirnov two-sample
test statistics for testing that two continuous cumulative distribution functions (CDF's)
are identical based upon two random samples. One- or two-sided alternatives are
allowed. Exact p-values are computed for the two-sided test when
\(nm \le 104\),
where n is the number of non-missing X observations and
m the number of non-missing Y observation.
Let \(F_n(x)\) denote the empirical CDF in the X sample,
let \(G_m(y)\) denote the empirical CDF in the Y sample
and let the corresponding population
distribution functions be denoted by
\(F(x)\) and \(G(y)\), respectively.
Then, the hypotheses tested by KolmogorovTwoSample are as follows:
$$
\begin{array}{ll}
H_0:~ F(x) = G(x) & H_1:~F(x) \ne G(x) \\
H_0:~ F(x) \ge G(x) & H_1:~F(x) \lt G(x) \\
H_0:~ F(x) \le G(x) & H_1:~F(x) \gt G(x)
\end{array}
$$
The test statistics are given as follows:
$$
\begin{array}{rl}
D_{mn} & = \max(D_{mn}^{+}, D_{mn}^{-}) \\
D_{mn}^{+} & = \max_x(F_n(x)-G_m(x)) \\
D_{mn}^{-} & = \max_x(G_m(x)-F_n(x))
\end{array}
$$
Asymptotically, the distribution of the statistic
$$
Z = D_{mn} \sqrt{\frac{mn}{m+n}}
$$
converges to a distribution given by Smirnov (1939).
Exact probabilities for the two-sided test are computed when \(nm \le 104\), according to an algorithm given by Kim and Jennrich (1973). When \(nm \gt 104\), the very good approximations given by Kim and Jennrich are used to obtain the two-sided p-values. The one-sided probability is taken as one half the two-sided probability. This is a very good approximation when the p-value is small (say, less than 0.10) and not very good for large p-values.
- See Also:
-
Constructor Summary
ConstructorsConstructorDescriptionKolmogorovTwoSample(double[] x, double[] y) Constructs a two sample Kolmogorov-Smirnov goodness-of-fit test. -
Method Summary
Modifier and TypeMethodDescriptiondoubleReturns \(D^{+}\), the maximum difference between the theoretical and empirical CDF's.doubleReturns \(D^{-}\), the minimum difference between the theoretical and empirical CDF's.intReturns the number of missing values in thexsample.intReturns the number of missing values in theysample.doubleProbability of the statistic exceeding D under the null hypothesis of equality and against the one-sided alternative.doubleReturns \(D = \max(D^{+}, D^{-})\).doubleProbability of the statistic exceeding D under the null hypothesis of equality and against the two-sided alternative.doublegetZ()Returns the normalized D statistic without the continuity correction applied.
-
Constructor Details
-
KolmogorovTwoSample
public KolmogorovTwoSample(double[] x, double[] y) Constructs a two sample Kolmogorov-Smirnov goodness-of-fit test.- Parameters:
x- is an array containing the observations from the first sample.y- is an array containing the observations from the second sample.
-
-
Method Details
-
getTestStatistic
public double getTestStatistic()Returns \(D = \max(D^{+}, D^{-})\).- Returns:
- The value D.
-
getMaximumDifference
public double getMaximumDifference()Returns \(D^{+}\), the maximum difference between the theoretical and empirical CDF's.- Returns:
- The value \(D^{+}\).
-
getMinimumDifference
public double getMinimumDifference()Returns \(D^{-}\), the minimum difference between the theoretical and empirical CDF's.- Returns:
- The value \(D^{-}\).
-
getZ
public double getZ()Returns the normalized D statistic without the continuity correction applied.- Returns:
- the value Z
-
getOneSidedPValue
public double getOneSidedPValue()Probability of the statistic exceeding D under the null hypothesis of equality and against the one-sided alternative. An exact probability is computed if the number of observation is less than or equal to 80, otherwise an approximate probability is computed.- Returns:
- the one-sided probability.
-
getTwoSidedPValue
public double getTwoSidedPValue()Probability of the statistic exceeding D under the null hypothesis of equality and against the two-sided alternative. This probability is twice the probability, \(p_1\), reported bygetOneSidedPValue, (or 1.0 if \(p_1 \ge 1/2\)). This approximation is nearly exact when \(p_1 \lt 0.1\).- Returns:
- the two-sided probability.
-
getNumberMissingX
public int getNumberMissingX()Returns the number of missing values in thexsample.- Returns:
- The number of missing values in
x.
-
getNumberMissingY
public int getNumberMissingY()Returns the number of missing values in theysample.- Returns:
- The number of missing values in
y.
-