kolmogorovOne

Performs a Kolmogorov-Smirnov one-sample test for continuous distributions.

Synopsis

kolmogorovOne (cdf, x)

Required Arguments

float cdf (x) (Input)
User-supplied function to compute the cumulative distribution function (CDF) at a given value. The form is CDF(x), where x is the value at which cdf is to be evaluated (Input) and cdf is the value of CDF at x. (Output)
float x[] (Input)
Array of size nObservations containing the observations.

Return Value

An array of length 3 containing Z, \(p_1\), and \(p_2\).

Optional Arguments

differences (Output)
The array containing \(D_n\), \(D_n^+\), \(D_n^-\).
nMissing (Ouput)
Number of missing values is returned in nMissing.

Description

The routine kolmogorovOne performs a Kolmogorov-Smirnov goodness-of-fit test in one sample. The hypotheses tested follow:

\[\begin{split}\begin{array}{l} \bullet H_0:F(x) = F^*(x) \phantom{.....} H_1:F(x) \ne F^*(x) \\ \bullet H_0:F(x) \geq F^*(x) \phantom{.....} H_1:F(x) < F^*(x) \\ \bullet H_0:F(x) \leq F^*(x) \phantom{.....} H_1:F(x) > F^*(x) \\ \end{array}\end{split}\]

where F is the cumulative distribution function (CDF) of the random variable, and the theoretical cdf, F*, is specified via the user-supplied function cdf. Let n = nObservations - nMissing. The test statistics for both one-sided alternatives

\[D_n^+ = \mathtt{differences}[1]\]

and

\[D_n^- = \mathtt{differences}[2]\]

and the two-sided (\(D_n\) = differences[0]) alternative are computed as well as an asymptotic z-score (testStatistics[0]) and p-values associated with the one-sided (testStatistics[1]) and two-sided (testStatistics[2]) hypotheses. For \(n>80\), asymptotic p-values are used (see Gibbons 1971). For \(n\leq 80\), exact one-sided p-values are computed according to a method given by Conover (1980, page 350). An approximate two-sided test p-value is obtained as twice the one-sided p-value. The approximation is very close for one-sided p-values less than 0.10 and becomes very bad as the one-sided p-values get larger.

Programming Notes

  1. The theoretical CDF is assumed to be continuous. If the CDF is not continuous, the statistics

    \[D_n^*\]

    will not be computed correctly.

  2. Estimation of parameters in the theoretical CDF from the sample data will tend to make the p-values associated with the test statistics too liberal. The empirical CDF will tend to be closer to the theoretical CDF than it should be.

  3. No attempt is made to check that all points in the sample are in the support of the theoretical CDF. If all sample points are not in the support of the CDF, the null hypothesis must be rejected.

Example

In this example, a random sample of size 100 is generated via routine randomUniform (Chapter 12,:doc:/stat/random/index) for the uniform (0, 1) distribution. We want to test the null hypothesis that the cdf is the standard normal distribution with a mean of 0.5 and a variance equal to the uniform (0, 1) variance (1/12).

from __future__ import print_function
from numpy import *
from pyimsl.stat.kolmogorovOne import kolmogorovOne
from pyimsl.stat.normalCdf import normalCdf
from pyimsl.stat.randomSeedSet import randomSeedSet
from pyimsl.stat.randomUniform import randomUniform


def cdf(x):
    mean = .5
    std = .2886751
    z = (x - mean) / std
    return normalCdf(z)


nobs = 100
randomSeedSet(123457)
x = randomUniform(nobs)
nMissing = []
differences = []

statistics = kolmogorovOne(cdf, x,
                           nMissing=nMissing, differences=differences)

print("D      = %8.4f" % (differences[0]))
print("D+     = %8.4f" % (differences[1]))
print("D-     = %8.4f" % (differences[2]))
print("Z      = %8.4f" % (statistics[0]))
print("Prob greater D one sided  = %8.4f" % (statistics[1]))
print("Prob greater D two sided  = %8.4f" % (statistics[2]))
print("N missing = %d" % (nMissing[0]))

Output

D      =   0.1471
D+     =   0.0810
D-     =   0.1471
Z      =   1.4708
Prob greater D one sided  =   0.0132
Prob greater D two sided  =   0.0264
N missing = 0

Warning Errors

IMSLS_TIE_DETECTED # ties were detected in the sample.

Fatal Errors

IMSLS_STOP_USER_FCN

Request from user supplied function to stop algorithm.

User flag = “#”.