KSTWO

Performs a Kolmogorov‑Smirnov two‑sample test.

Required Arguments

X — Vector of length NOBSX containing the observations in sample one. (Input)

Y — Vector of length NOBSY containing the observations in sample two. (Input)

PDIF — Vector of length 6 containing the output statistics. (Output)

 

I

PDIF(I)

1

Dmn = Maximum of the absolute values of and .

2

= Maximum difference between the empirical cumulative distribution function (CDF) of X minus the empirical CDF of Y.

3

= Maximum difference between the empirical CDF of X minus the empirical CDF of Y. (The maximum of the negative differences.)

4

Z = Standardized value of Dmn. A two‑sample approximation with no correction for continuity is used.

5

One‑sided probability of a larger Dmn under the null hypothesis of equal distributions.

6

Two‑sided probability of exceeding Dmn under the null hypothesis of equal distributions.

Optional Arguments

NOBSX — Number of observations in sample one. (Input)
Default: NOBSX = size (X,1).

NOBSY — Number of observations in sample two. (Input)
Default: NOBSY = size (Y,1).

NMISSX — Number of missing observations in the X sample. (Output)

NMISSY — Number of missing observations in the Y sample. (Output)

FORTRAN 90 Interface

Generic: CALL KSTWO (X, Y, PDIF [])

Specific: The specific interface names are S_KSTWO and D_KSTWO.

FORTRAN 77 Interface

Single: CALL KSTWO (NOBSX, X, NOBSY, Y, PDIF, NMISSX, NMISSY)

Double: The double precision name is DKSTWO.

Description

Routine KSTWO computes Kolmogorov‑Smirnov two‑sample test statistics for testing that two continuous cumulative distribution functions (CDF’s) are identical based upon two random samples. One- or two‑sided alternatives are allowed. Exact p‑values are computed for the two‑sided test when NOBSX * NOBSY is less than 104.

Let Fn(x) denote the empirical CDF in the X sample, let Gm(y) denote the empirical CDF in the Y sample, where n = NOBSX  NMISSX and m = NOBSY  NMISSY, and let the corresponding population distribution functions be denoted by F(x) and G(y), respectively. Then, the hypotheses tested by KSTWO are as follows:

 

The test statistics are given as follows:

 

Asymptotically, the distribution of the statistic

 

(returned in PDIF(4)) converges to a distribution given by Smirnov (1939).

Exact probabilities for the two‑sided test are computed when nm is less than or equal to 104, according to an algorithm given by Kim and Jennrich (1973), and computed here via function AKS2DF (see Chapter 17, “Probability Distribution Functions and Inverses”). When nm is greater than 104, the very good approximations given by Kim and Jennrich are used to obtain the two‑sided p‑values. The one‑sided probability is taken as one half the two‑sided probability. This is a very good approximation when the p‑value is small (say, less than 0.10) and not very good for large p‑values

Comments

Workspace may be explicitly provided, if desired, by use of K2TWO/DK2TWO. The reference is:

CALL K2TWO (NOBSX, X, NOBSY, Y, PDIF, NMISSX, NMISSY, XWK, YWK)

The additional arguments are as follows:

XWK — Work vector of length NOBSX + 1.

YWK — Work vector of length NOBSY + 1.

Example

The following example illustrates the KSTWO routine with two randomly generated samples from a uniform(0,1) distribution. Since the two theoretical distributions are identical, we would not expect to reject the null hypothesis.

 

USE RNSET_INT

USE RNUN_INT

USE KSTWO_INT

USE UMACH_INT

 

IMPLICIT NONE

INTEGER ISEED, NOBSX, NOBSY, NMISSX, NMISSY, NOUT

PARAMETER (ISEED=123457, NOBSX=100, NOBSY=60)

REAL X(NOBSX), Y(NOBSY), PDIF(6)

! Generate the sample

CALL RNSET(ISEED)

CALL RNUN (X)

CALL RNUN (Y)

!

CALL KSTWO (X, Y, PDIF, NMISSX=NMISSX, NMISSY=NMISSY)

!

CALL UMACH(2, NOUT)

WRITE(NOUT, 5) PDIF, NMISSX, NMISSY

5 FORMAT(' D = ', F8.4 / ' D+ = ', F8.4 / ' D- = ', F8.4,/ &

' Z = ', F8.4 / ' Prob greater D one sided = ', F8.4 / &

' Prob greater D two sided = ', F8.4 / &

' Missing X = ', I3 / ' Missing Y = ', I3)

END

Output

 

D = 0.1800

D+ = 0.1800

D- = 0.0100

Z = 1.1023

Prob greater D one sided = 0.0720

Prob greater D two sided = 0.1440

Missing X = 0

Missing Y = 0