Performs a Kolmogorov‑Smirnov one‑sample test for continuous distributions.
Required Arguments
CDF — User‑supplied FUNCTION to compute the cumulative distribution function (CDF) at a given value. The form is CDF(Y), where
Y – Value at which CDF is to be evaluated. (Input) CDF – Value of CDF at Y. (Output)
CDF must be declared EXTERNAL in the calling program.
X — Vector of length NOBS containing the observations. (Input)
PDIF — Vector of length 6 containing the output statistics. (Output)
I
PDIF(I)
1
Dn = Maximum of
2
Maximum difference between the theoretical and empirical CDF’s
3
Maximum difference between the empirical and theoretical CDF’s
4
5
Probability of the statistic exceeding Dn under the null hypothesis of equality and against the one‑sided alternative. An exact probability is computed for NOBS≤ 80, and an approximate probability is computed for NOBS > 80. See function AKS1DF ( Chapter 17, “Probability Distribution Functions and Inverses”).
6
Probability of the statistic exceeding Dn under the null hypothesis of equality and against the two‑sided alternative. This probability is twice the probability reported in PDIF(5), (or 1.0 if 2 *PDIF(5) is greater than 1.0). This approximation is nearly exact when PDIF(5) is less than 0.10.
Optional Arguments
NOBS — Number of observations. (Input) Default: NOBS = size (X,1).
NMISS — Number of missing (NaN, not a number) values. (Output)
FORTRAN 90 Interface
Generic: CALLKSONE (CDF, X, PDIF[, …])
Specific: The specific interface names are S_KSONE and D_KSONE.
FORTRAN 77 Interface
Single: CALLKSONE (CDF, NOBS, X, PDIF, NMISS)
Double: The double precision name is DKSONE.
Description
The routine KSONE performs a Kolmogorov‑Smirnov goodness‑of‑fit test in one sample. The hypotheses tested follow:
where F is the cumulative distribution function (CDF) of the random variable, and the theoretical CDF, F* , is specified via the user‑supplied FUNCTIONCDF. Let n = NOBS‑NMISS. The test statistics for both one‑sided alternatives
and
and the two‑sided (Dn = PDIF(1)) alternative are computed as well as an asymptotic z‑score (PDIF(4)) and p‑values associated with the one‑sided (PDIF(5)) and two‑sided (PDIF(6)) hypotheses. For n > 80, asymptotic p‑values are used (see Gibbons 1971). For n≤ 80, exact one‑sided p‑values are computed according to a method given by Conover (1980, page 350). An approximate two‑sided test p‑value is obtained as twice the one‑sided p‑value. The approximation is very close for one‑sided p‑values less than 0.10 and becomes very bad as the one‑sided p‑values get larger.
Comments
1. Workspace may be explicitly provided, if desired, by use of K2ONE/DK2ONE. The reference is:
CALLK2ONE (CDF, NOBS, X, PDIF, NMISS, XWK)
The additional argument is:
XWK — Work vector of length 3 * (NOBS + 1) if NOBS≤ 80, or of length NOBS if NOBS > 80.
2. Informational errors
Type
Code
Description
4
2
PDIF, the output cumulative distribution value from CDF, must be greater than or equal to 0.0 and less than or equal to 1.0 (by definition of a probability distribution function).
4
3
At least one tie is detected in X. Ties are not allowed in KSONE.
4
4
PDIF, the output cumulative distribution value from CDF, cannot decrease with increasing X (by the definition of a cumulative distribution function).
4
6
All the elements of X are missing (NaN, not a number) values.
3. No check is made for the validity of the input data. Thus, although one or more of the X(I) may be inconsistent with the distribution in that an observation may be outside of the range of the distribution, KSONE will not detect the anomaly (unless the user causes it to be detected via the function CDF).
Programming Notes
1. The theoretical CDF is assumed to be continuous. If the CDF is not continuous, the statistics
will not be computed correctly.
2. Estimation of parameters in the theoretical CDF from the sample data will tend to make the p‑values associated with the test statistics too liberal. The empirical CDF will tend to be closer to the theoretical CDF than it should be.
3. No attempt is made to check that all points in the sample are in the support of the theoretical CDF. If all sample points are not in the support of the CDF, the null hypothesis must be rejected.
4. The user must supply an external FUNCTION that calculates the theoretical CDF for a given abscissa. The calling program must contain an EXTERNAL statement with the name of this routine. Often, IMSL functions in Chapter 17, “Probability Distribution Functions and Inverses” may be used. Examples of possible user‑supplied routines follow. Each FORTRAN function would be preceded by the statement
REAL FUNCTION CDF(X)
and ended by a RETURN and an END statement.
a. Normal (μ, σ2) Z = (X‑μ)/σ CDF = ANORDF(Z)
b. Uniform[a, b] If(X .LT. a) THEN CDF = 0.0 ELSE IF(X .GT. b) THEN CDF = 1.0 ELSE CDF = (X‑a)/(b‑a) END IF
c. Minimum of nCDF = 1.0 ‑ (1.0 ‑X)**n Uniform(0, 1) random numbers
Example
In this example, a random sample of size 100 is generated via routine RNUN (see Chapter 18, “Random Number Generation” for the uniform (0, 1) distribution. We want to test the null hypothesis that the CDF is the standard normal distribution with a mean of 0.5 and a variance equal to the uniform (0, 1) variance (1/12).
USE RNSET_INT
USE RNUN_INT
USE KSONE_INT
USE UMACH_INT
IMPLICIT NONE
INTEGER ISEED, NOBS
PARAMETER (ISEED=123457, NOBS=100)
!
INTEGER NMISS, NOUT
REAL CDF, PDIF(6), X(100)
EXTERNAL CDF
! Generate the sample
CALL RNSET (ISEED)
CALL RNUN (X)
!
CALL KSONE (CDF, X, PDIF, NMISS=NMISS)
!
CALL UMACH (2, NOUT)
WRITE (NOUT,99999) NMISS, PDIF
99999 FORMAT ('NMISS = ', I4/' D = ', F8.4/' D+ = ', F8.4/ &