X — Vector of length NOBSX containing the observations in sample one. (Input)
Y — Vector of length NOBSY containing the observations in sample two. (Input)
PDIF — Vector of length 6 containing the output statistics. (Output)
I
PDIF(I)
1
Dmn = Maximum of the absolute values of and .
2
= Maximum difference between the empirical cumulative distribution function (CDF) of X minus the empirical CDF of Y.
3
= Maximum difference between the empirical CDF of X minus the empirical CDF of Y. (The maximum of the negative differences.)
4
Z = Standardized value of Dmn. A two‑sample approximation with no correction for continuity is used.
5
One‑sided probability of a larger Dmn under the null hypothesis of equal distributions.
6
Two‑sided probability of exceeding Dmn under the null hypothesis of equal distributions.
Optional Arguments
NOBSX — Number of observations in sample one. (Input) Default: NOBSX = size (X,1).
NOBSY — Number of observations in sample two. (Input) Default: NOBSY = size (Y,1).
NMISSX — Number of missing observations in the X sample. (Output)
NMISSY — Number of missing observations in the Y sample. (Output)
FORTRAN 90 Interface
Generic: CALLKSTWO (X, Y, PDIF[, …])
Specific: The specific interface names are S_KSTWO and D_KSTWO.
FORTRAN 77 Interface
Single: CALLKSTWO (NOBSX, X, NOBSY, Y, PDIF, NMISSX, NMISSY)
Double: The double precision name is DKSTWO.
Description
Routine KSTWO computes Kolmogorov‑Smirnov two‑sample test statistics for testing that two continuous cumulative distribution functions (CDF’s) are identical based upon two random samples. One- or two‑sided alternatives are allowed. Exact p‑values are computed for the two‑sided test when NOBSX*NOBSY is less than 104.
Let Fn(x) denote the empirical CDF in the X sample, let Gm(y) denote the empirical CDF in the Y sample, where n = NOBSX‑NMISSX and m = NOBSY‑NMISSY, and let the corresponding population distribution functions be denoted by F(x) and G(y), respectively. Then, the hypotheses tested by KSTWO are as follows:
The test statistics are given as follows:
Asymptotically, the distribution of the statistic
(returned in PDIF(4)) converges to a distribution given by Smirnov (1939).
Exact probabilities for the two‑sided test are computed when nm is less than or equal to 104, according to an algorithm given by Kim and Jennrich (1973), and computed here via function AKS2DF (see Chapter 17, “Probability Distribution Functions and Inverses”). When nm is greater than 104, the very good approximations given by Kim and Jennrich are used to obtain the two‑sided p‑values. The one‑sided probability is taken as one half the two‑sided probability. This is a very good approximation when the p‑value is small (say, less than 0.10) and not very good for large p‑values
Comments
Workspace may be explicitly provided, if desired, by use of K2TWO/DK2TWO. The reference is:
CALLK2TWO(NOBSX, X, NOBSY, Y, PDIF, NMISSX, NMISSY, XWK, YWK)
The additional arguments are as follows:
XWK — Work vector of length NOBSX + 1.
YWK — Work vector of length NOBSY + 1.
Example
The following example illustrates the KSTWO routine with two randomly generated samples from a uniform(0,1) distribution. Since the two theoretical distributions are identical, we would not expect to reject the null hypothesis.
USE RNSET_INT
USE RNUN_INT
USE KSTWO_INT
USE UMACH_INT
IMPLICIT NONE
INTEGER ISEED, NOBSX, NOBSY, NMISSX, NMISSY, NOUT
PARAMETER (ISEED=123457, NOBSX=100, NOBSY=60)
REAL X(NOBSX), Y(NOBSY), PDIF(6)
! Generate the sample
CALL RNSET(ISEED)
CALL RNUN (X)
CALL RNUN (Y)
!
CALL KSTWO (X, Y, PDIF, NMISSX=NMISSX, NMISSY=NMISSY)