Class KolmogorovOneSample
- All Implemented Interfaces:
Serializable
KolmogorovOneSample performs a Kolmogorov-Smirnov
goodness-of-fit test in one sample.
The hypotheses tested follow: $$ \begin{array}{ll} H_0:~ F(x) = F^{*}(x) & H_1:~F(x) \ne F^{*}(x) \\ H_0:~ F(x) \ge F^{*}(x) & H_1:~F(x) \lt F^{*}(x) \\ H_0:~ F(x) \le F^{*}(x) & H_1:~F(x) \gt F^{*}(x) \end{array} $$ where \(F\) is the cumulative distribution function (CDF) of the random variable, and the theoretical cdf, \(F^{*}\), is specified via the user-supplied function cdf. Let n be the number of observations minus the number of missing observations. The test statistics for both one-sided alternatives \(D_n^{+}\) and \(D_n^{-}\) and the two-sided \(D_n\) alternative are computed as well as an asymptotic z-score and p-values associated with the one-sided and two-sided hypotheses. For \(n \gt 80\), asymptotic p-values are used (see Gibbons 1971). For \(n \le 80\), exact one-sided p-values are computed according to a method given by Conover (1980, page 350). An approximate two-sided test p-value is obtained as twice the one-sided p-value. The approximation is very close for one-sided p-values less than 0.10 and becomes very bad as the one-sided p-values get larger.
The theoretical CDF is assumed to be continuous. If the CDF is not continuous, the statistics \(D_n^{*}\) will not be computed correctly.
Estimation of parameters in the theoretical CDF from the sample data will tend to make the p-values associated with the test statistics too liberal. The empirical CDF will tend to be closer to the theoretical CDF than it should be.
No attempt is made to check that all points in the sample are in the support of the theoretical CDF. If all sample points are not in the support of the CDF, the null hypothesis must be rejected.
- See Also:
-
Constructor Summary
ConstructorsConstructorDescriptionKolmogorovOneSample(CdfFunction cdf, double[] x) Constructs a one sample Kolmogorov-Smirnov goodness-of-fit test. -
Method Summary
Modifier and TypeMethodDescriptiondoubleReturns \(D^{+}\), the maximum difference between the theoretical and empirical CDF's.doubleReturns \(D^{-}\), the minimum difference between the theoretical and empirical CDF's.intReturns the number of missing values in the data.intReturns the number of ties in the data.doubleProbability of the statistic exceeding D under the null hypothesis of equality and against the one-sided alternative.doubleReturns \(D = \max(D^{+}, D^{-})\).doubleProbability of the statistic exceeding D under the null hypothesis of equality and against the two-sided alternative.doublegetZ()Returns the normalized D statistic without the continuity correction applied.
-
Constructor Details
-
KolmogorovOneSample
Constructs a one sample Kolmogorov-Smirnov goodness-of-fit test.- Parameters:
cdf- is the cdf function, \(F(x)\). If must be non-decreasing and its value must be in [0, 1].x- is adoublearray containing the observations.
-
-
Method Details
-
getNumberOfTies
public int getNumberOfTies()Returns the number of ties in the data.- Returns:
- the number of ties in the data
-
getTestStatistic
public double getTestStatistic()Returns \(D = \max(D^{+}, D^{-})\).- Returns:
- The value D.
-
getMaximumDifference
public double getMaximumDifference()Returns \(D^{+}\), the maximum difference between the theoretical and empirical CDF's.- Returns:
- The value \(D^{+}\).
-
getMinimumDifference
public double getMinimumDifference()Returns \(D^{-}\), the minimum difference between the theoretical and empirical CDF's.- Returns:
- The value \(D^{-}\).
-
getZ
public double getZ()Returns the normalized D statistic without the continuity correction applied.- Returns:
- the value Z
-
getOneSidedPValue
public double getOneSidedPValue()Probability of the statistic exceeding D under the null hypothesis of equality and against the one-sided alternative. An exact probability is computed if the number of observation is less than or equal to 80, otherwise an approximate probability is computed.- Returns:
- the one-sided probability.
-
getTwoSidedPValue
public double getTwoSidedPValue()Probability of the statistic exceeding D under the null hypothesis of equality and against the two-sided alternative. This probability is twice the probability, \(p_1\), reported bygetOneSidedPValue, (or 1.0 if \(p_1 \ge 1/2\)). This approximation is nearly exact when \(p_1 \lt 0.1\).- Returns:
- the two-sided probability.
-
getNumberMissing
public int getNumberMissing()Returns the number of missing values in the data.- Returns:
- The number of missing values.
-