KSONE
Performs a Kolmogorov‑Smirnov one‑sample test for continuous distributions.
Required Arguments
CDF — User‑supplied FUNCTION to compute the cumulative distribution function (CDF) at a given value. The form is CDF(Y), where
Y – Value at which CDF is to be evaluated. (Input)
CDF – Value of CDF at Y. (Output)
CDF must be declared EXTERNAL in the calling program.
X — Vector of length NOBS containing the observations. (Input)
PDIF — Vector of length 6 containing the output statistics. (Output)
Optional Arguments
NOBS — Number of observations. (Input)
Default: NOBS = size (X,1).
NMISS — Number of missing (NaN, not a number) values. (Output)
FORTRAN 90 Interface
Generic: CALL KSONE (CDF, X, PDIF [, …])
Specific: The specific interface names are S_KSONE and D_KSONE.
FORTRAN 77 Interface
Single: CALL KSONE (CDF, NOBS, X, PDIF, NMISS)
Double: The double precision name is DKSONE.
Description
The routine KSONE performs a Kolmogorov‑Smirnov goodness‑of‑fit test in one sample. The hypotheses tested follow:
where F is the cumulative distribution function (CDF) of the random variable, and the theoretical CDF, F* , is specified via the user‑supplied FUNCTION CDF. Let n = NOBS ‑ NMISS. The test statistics for both one‑sided alternatives
and
and the two‑sided (Dn = PDIF(1)) alternative are computed as well as an asymptotic z‑score (PDIF(4)) and p‑values associated with the one‑sided (PDIF(5)) and two‑sided (PDIF(6)) hypotheses. For n > 80, asymptotic p‑values are used (see Gibbons 1971). For n ≤ 80, exact one‑sided p‑values are computed according to a method given by Conover (1980, page 350). An approximate two‑sided test p‑value is obtained as twice the one‑sided p‑value. The approximation is very close for one‑sided p‑values less than 0.10 and becomes very bad as the one‑sided p‑values get larger.
Comments
1. Workspace may be explicitly provided, if desired, by use of K2ONE/DK2ONE. The reference is:
CALL K2ONE (CDF, NOBS, X, PDIF, NMISS, XWK)
The additional argument is:
XWK — Work vector of length 3 * (NOBS + 1) if NOBS ≤ 80, or of length NOBS if NOBS > 80.
2. Informational errors
Type |
Code |
Description |
4 |
2 |
PDIF, the output cumulative distribution value from CDF, must be greater than or equal to 0.0 and less than or equal to 1.0 (by definition of a probability distribution function). |
4 |
3 |
At least one tie is detected in X. Ties are not allowed in KSONE. |
4 |
4 |
PDIF, the output cumulative distribution value from CDF, cannot decrease with increasing X (by the definition of a cumulative distribution function). |
4 |
6 |
All the elements of X are missing (NaN, not a number) values. |
3. No check is made for the validity of the input data. Thus, although one or more of the X(I) may be inconsistent with the distribution in that an observation may be outside of the range of the distribution, KSONE will not detect the anomaly (unless the user causes it to be detected via the function CDF).
Programming Notes
1. The theoretical CDF is assumed to be continuous. If the CDF is not continuous, the statistics
will not be computed correctly.
2. Estimation of parameters in the theoretical CDF from the sample data will tend to make the p‑values associated with the test statistics too liberal. The empirical CDF will tend to be closer to the theoretical CDF than it should be.
3. No attempt is made to check that all points in the sample are in the support of the theoretical CDF. If all sample points are not in the support of the CDF, the null hypothesis must be rejected.
4. The user must supply an external FUNCTION that calculates the theoretical CDF for a given abscissa. The calling program must contain an EXTERNAL statement with the name of this routine. Often, IMSL functions in Chapter 17, “Probability Distribution Functions and Inverses” may be used. Examples of possible user‑supplied routines follow. Each FORTRAN function would be preceded by the statement
REAL FUNCTION CDF(X)
and ended by a RETURN and an END statement.
a. Normal (μ, σ2) Z = (X ‑ μ)/σ
CDF = ANORDF(Z)
b. Uniform[a, b] If(X .LT. a) THEN
CDF = 0.0
ELSE IF(X .GT. b) THEN
CDF = 1.0
ELSE
CDF = (X ‑ a)/(b ‑ a)
END IF
c. Minimum of n CDF = 1.0 ‑ (1.0 ‑ X)**n
Uniform(0, 1) random numbers
Example
In this example, a random sample of size 100 is generated via routine RNUN (see Chapter 18, “Random Number Generation” for the uniform (0, 1) distribution. We want to test the null hypothesis that the CDF is the standard normal distribution with a mean of 0.5 and a variance equal to the uniform (0, 1) variance (1/12).
USE RNSET_INT
USE RNUN_INT
USE KSONE_INT
USE UMACH_INT
IMPLICIT NONE
INTEGER ISEED, NOBS
PARAMETER (ISEED=123457, NOBS=100)
!
INTEGER NMISS, NOUT
REAL CDF, PDIF(6), X(100)
EXTERNAL CDF
! Generate the sample
CALL RNSET (ISEED)
CALL RNUN (X)
!
CALL KSONE (CDF, X, PDIF, NMISS=NMISS)
!
CALL UMACH (2, NOUT)
WRITE (NOUT,99999) NMISS, PDIF
99999 FORMAT ('NMISS = ', I4/' D = ', F8.4/' D+ = ', F8.4/ &
' D- = ', F8.4/' Z = ', F8.4/' Prob greater D', &
' one-sided = ', F8.4/' Prob greater D two-sided = ', &
F8.4)
END
!
! The CDF
!
REAL FUNCTION CDF (X)
REAL X
!
REAL AMEAN, STD
PARAMETER (AMEAN=0.50, STD=0.2886751)
!
REAL ANORDF, Z
EXTERNAL ANORDF
! Standardize
Z = (X-AMEAN)/STD
! Get the probability
CDF = ANORDF(Z)
!
RETURN
END
Output
NMISS = 0
D = 0.1471
D+ = 0.0810
D- = 0.1471
Z = 1.4708
Prob greater D one-sided = 0.0132
Prob greater D two-sided = 0.0264