KSONE

Performs a Kolmogorov‑Smirnov one‑sample test for continuous distributions.

Required Arguments

CDF — User‑supplied FUNCTION to compute the cumulative distribution function (CDF) at a given value. The form is CDF(Y), where

Y – Value at which CDF is to be evaluated. (Input)
CDF – Value of CDF at Y. (Output)

CDF must be declared EXTERNAL in the calling program.

X — Vector of length NOBS containing the observations. (Input)

PDIF — Vector of length 6 containing the output statistics. (Output)

 

I

PDIF(I)

1

Dn = Maximum of

2

Maximum difference between the theoretical and empirical CDF’s

3

Maximum difference between the empirical and theoretical CDF’s

4

5

Probability of the statistic exceeding Dn under the null hypothesis of equality and against the one‑sided alternative. An exact probability is computed for NOBS  80, and an approximate probability is computed for NOBS > 80. See function AKS1DF ( Chapter 17, “Probability Distribution Functions and Inverses”).

6

Probability of the statistic exceeding Dn under the null hypothesis of equality and against the two‑sided alternative. This probability is twice the probability reported in PDIF(5), (or 1.0 if 2 * PDIF(5) is greater than 1.0). This approximation is nearly exact when PDIF(5) is less than 0.10.

Optional Arguments

NOBS — Number of observations. (Input)
Default: NOBS = size (X,1).

NMISS — Number of missing (NaN, not a number) values. (Output)

FORTRAN 90 Interface

Generic: CALL KSONE (CDF, X, PDIF [])

Specific: The specific interface names are S_KSONE and D_KSONE.

FORTRAN 77 Interface

Single: CALL KSONE (CDF, NOBS, X, PDIF, NMISS)

Double: The double precision name is DKSONE.

Description

The routine KSONE performs a Kolmogorov‑Smirnov goodness‑of‑fit test in one sample. The hypotheses tested follow:

 

where F is the cumulative distribution function (CDF) of the random variable, and the theoretical CDF, F* , is specified via the user‑supplied FUNCTION CDF. Let n = NOBS  NMISS. The test statistics for both one‑sided alternatives

 

and

 

and the two‑sided (Dn = PDIF(1)) alternative are computed as well as an asymptotic z‑score (PDIF(4)) and p‑values associated with the one‑sided (PDIF(5)) and two‑sided (PDIF(6)) hypotheses. For n > 80, asymptotic p‑values are used (see Gibbons 1971). For n 80, exact one‑sided p‑values are computed according to a method given by Conover (1980, page 350). An approximate two‑sided test p‑value is obtained as twice the one‑sided p‑value. The approximation is very close for one‑sided p‑values less than 0.10 and becomes very bad as the one‑sided p‑values get larger.

Comments

1. Workspace may be explicitly provided, if desired, by use of K2ONE/DK2ONE. The reference is:

CALL K2ONE (CDF, NOBS, X, PDIF, NMISS, XWK)

The additional argument is:

XWK — Work vector of length 3 * (NOBS + 1) if NOBS 80, or of length NOBS if NOBS > 80.

2. Informational errors

 

Type

Code

Description

4

2

PDIF, the output cumulative distribution value from CDF, must be greater than or equal to 0.0 and less than or equal to 1.0 (by definition of a probability distribution function).

4

3

At least one tie is detected in X. Ties are not allowed in KSONE.

4

4

PDIF, the output cumulative distribution value from CDF, cannot decrease with increasing X (by the definition of a cumulative distribution function).

4

6

All the elements of X are missing (NaN, not a number) values.

3. No check is made for the validity of the input data. Thus, although one or more of the X(I) may be inconsistent with the distribution in that an observation may be outside of the range of the distribution, KSONE will not detect the anomaly (unless the user causes it to be detected via the function CDF).

Programming Notes

1. The theoretical CDF is assumed to be continuous. If the CDF is not continuous, the statistics

 

will not be computed correctly.

2. Estimation of parameters in the theoretical CDF from the sample data will tend to make the p‑values associated with the test statistics too liberal. The empirical CDF will tend to be closer to the theoretical CDF than it should be.

3. No attempt is made to check that all points in the sample are in the support of the theoretical CDF. If all sample points are not in the support of the CDF, the null hypothesis must be rejected.

4. The user must supply an external FUNCTION that calculates the theoretical CDF for a given abscissa. The calling program must contain an EXTERNAL statement with the name of this routine. Often, IMSL functions in Chapter 17, “Probability Distribution Functions and Inverses” may be used. Examples of possible user‑supplied routines follow. Each FORTRAN function would be preceded by the statement

REAL FUNCTION CDF(X)

and ended by a RETURN and an END statement.

a. Normal (μ, σ2)            Z = (X  μ)/σ
CDF = ANORDF(Z)

b. Uniform[a, b]              If(X .LT. a) THEN
CDF = 0.0
ELSE IF(X .GT. b) THEN
CDF = 1.0
ELSE
CDF = (X  a)/(b  a)
END IF

c. Minimum of n           CDF = 1.0  (1.0  X)**n
Uniform(0, 1) random numbers

Example

In this example, a random sample of size 100 is generated via routine RNUN (see Chapter 18, “Random Number Generation” for the uniform (0, 1) distribution. We want to test the null hypothesis that the CDF is the standard normal distribution with a mean of 0.5 and a variance equal to the uniform (0, 1) variance (1/12).

 

USE RNSET_INT

USE RNUN_INT

USE KSONE_INT

USE UMACH_INT

 

IMPLICIT NONE

INTEGER ISEED, NOBS

PARAMETER (ISEED=123457, NOBS=100)

!

INTEGER NMISS, NOUT

REAL CDF, PDIF(6), X(100)

EXTERNAL CDF

! Generate the sample

CALL RNSET (ISEED)

CALL RNUN (X)

!

CALL KSONE (CDF, X, PDIF, NMISS=NMISS)

!

CALL UMACH (2, NOUT)

WRITE (NOUT,99999) NMISS, PDIF

99999 FORMAT ('NMISS = ', I4/' D = ', F8.4/' D+ = ', F8.4/ &

' D- = ', F8.4/' Z = ', F8.4/' Prob greater D', &

' one-sided = ', F8.4/' Prob greater D two-sided = ', &

F8.4)

END

!

! The CDF

!

REAL FUNCTION CDF (X)

REAL X

!

REAL AMEAN, STD

PARAMETER (AMEAN=0.50, STD=0.2886751)

!

REAL ANORDF, Z

EXTERNAL ANORDF

! Standardize

Z = (X-AMEAN)/STD

! Get the probability

CDF = ANORDF(Z)

!

RETURN

END

Output

 

NMISS = 0

D = 0.1471

D+ = 0.0810

D- = 0.1471

Z = 1.4708

Prob greater D one-sided = 0.0132

Prob greater D two-sided = 0.0264