RNKSM
Performs the Wilcoxon rank sum test.
Required Arguments
X — Vector of length NOBSX containing the first sample. (Input)
Y — Vector of length NOBSY containing the second sample. (Input)
FUZZ — Constant used to determine ties in X and Y. (Input)
If ∣zi ‑ zj∣ ≤ FUZZ, then zi and zj are said to be tied, where zi is the i‑th element of X or Y. FUZZ must be nonnegative.
STAT — Vector of length 10 containing the output statistics. (Output)
I |
STAT(I) |
1 |
Wilcoxon W statistic (the sum of the ranks of the X observations) adjusted for ties in such a manner that W is as small as possible. |
2 |
2 * E(W) ‑ W, where E(W) is the expected value of W. |
3 |
Probability of obtaining a statistic less than or equal to the minimum of (W, 2E(W) ‑ W ). |
4 |
W statistic adjusted for ties in such a manner that W is as large as is possible. |
5 |
STAT(2); but adjusted for ties as in 4. |
6 |
STAT(3); but adjusted for ties as in 4. |
7 |
W statistic with average ranks used in place of tied ranks. |
8 |
Estimated standard error of STAT(7) under the null hypothesis of no difference. |
9 |
Standard normal score associated with STAT(7). |
10 |
Two‑sided p‑value associated with STAT(9). |
Optional Arguments
NOBSX — Number of observations in X. (Input)
Default: NOBSX = size (X,1).
NOBSY — Number of observations in Y. (Input)
Default: NOBSY = size (Y,1).
NMISSX — Number of missing (NaN, not a number) observations in X. (Output)
NMISSY — Number of missing (NaN, not a number) observations in Y. (Output)
FORTRAN 90 Interface
Generic: CALL RNKSM (X, Y, FUZZ, STAT [, …])
Specific: The specific interface names are S_RNKSM and D_RNKSM.
FORTRAN 77 Interface
Single: CALL RNKSM (NOBSX, X, NOBSY, Y, FUZZ, STAT, NMISSX, NMISSY)
Double: The double precision name is DRNKSM.
Description
Routine RNKSM performs the Wilcoxon rank sum test for identical population distribution functions. The Wilcoxon test is a linear transformation of the Mann‑Whitney U test. If the difference between the two populations can be attributed solely to a difference in location, then the Wilcoxon test becomes a test of equality of the population means (or medians) and is the nonparametric equivalent of the two‑sample t‑test.
Routine RNKSM obtains ranks in the combined sample after first eliminating missing values from the data. The rank sum statistic is then computed as the sum of the ranks in the X sample. Three methods for handling ties are used. (A tie is counted when two observations are within FUZZ of each other.) The first method uses the largest possible rank for tied observations in the smallest sample, while the second method uses the smallest possible rank for these observations. Thus, the range of possible rank sums is obtained. The third, method for handling tied observations between samples uses the average rank of the tied observations.
Asymptotic standard normal scores are computed for the W score (based upon a variance that has been adjusted for ties) when average ranks are used (see Conover 1980, page 217), and the probability associated with the two‑sided alternative is computed.
Hypothesis Tests
In each test following, the first line gives the hypothesis (and its alternative) under the assumptions 1 to 3 below, while the second line gives the hypothesis when assumption 4 is also true. The rejection region is the same for both hypotheses and is given in terms of method 3 for handling ties. Another output statistic should be used (STAT(1) or STAT(4)) if another method for handling ties is desired.
-
H0 : Pr(X < Y) = 0.5 H1 : Pr(X < Y) ≠ 0.5
H0 : E(X) = E(Y) H1 : E(X) ≠ E(Y)
Reject if STAT(10) is less than the significance level of the test. Alternatively, reject H0 if STAT(7) is too large or too small.
-
H0 : Pr(X < Y) ≤ 0.5 H1 : Pr(X < Y) > 0.5
H0 : E(X) ≥ E(Y) H1 : E(X) < E(Y)
Reject if STAT(7) is too small.
-
H0 : Pr(X < Y) ≥ 0.5 H1 : Pr(X < Y) < 0.5
H0 : E(X) ≤ E(Y) H1 : E(X) > E(Y)
Reject if STAT(7) is too large.
Assumptions
1. X and Y are a random sample from their respective populations.
2. All observations are mutually independent.
3. The measurement scale is at least ordinal (i.e., an ordering less than, greater than, or equal to exists among the observations).
4. If F(X) and G(Y) are the distribution functions of X and Y, respectively, then G(Y) = F(X + c) for some constant c (i.e., the distribution of Y is at worst a translation of the distribution of X).
Tables of critical values of the W statistic are given in the references for small samples.
Comments
1. Workspace may be explicitly provided, if desired, by use of R2KSM/DR2KSM. The reference is:
CALL R2KSM (NOBSX, X, NOBSY, Y, FUZZ, STAT, NMISSX, NMISSY, IWK, YWK)
The additional arguments are as follows:
IWK — Integer work vector of length NOBSX + NOBSY
YWK — Work vector of length NOBSX + NOBSY.
2. Informational errors
Type |
Code |
Description |
3 |
4 |
Both NOBSX and NOBSY are less than 25. Tabled critical values for W should be used. |
3 |
5 |
Tied observations occurred between the samples. |
4 |
6 |
Each element of X and/or Y is a missing (NaN, not a number) value. |
3. The Mann‑Whitney U statistic is given in terms of W as U = W ‑ K * (K + 1)/2, where K = NOBSX, and W = STAT(1) (or STAT(4)). Tables of critical values for W are available in the references given in the manual document.
4. For greatest efficiency in computing W, the X sample should be the smallest sample.
Example
The following example is taken from Conover (1980, page 224). It involves the mixing time of 2 mixing machines using a total of 10 batches of a certain kind of batter, 5 batches for each machine. The null hypothesis is not rejected at the 5 percent level of significance. The warning error is always printed when one or more ties are detected unless printing for warning errors is turned off. See routine ERSET in the Reference Material.
USE RNKSM_INT
USE UMACH_INT
IMPLICIT NONE
INTEGER NOBSX, NOBSY
REAL FUZZ
PARAMETER (FUZZ=0.001, NOBSX=5, NOBSY=5)
!
INTEGER I, NMISSX, NMISSY, NOUT
REAL STAT(10), X(NOBSX), Y(NOBSY)
!
DATA X/7.3, 6.9, 7.2, 7.8, 7.2/
DATA Y/7.4, 6.8, 6.9, 6.7, 7.1/
!
CALL RNKSM (X, Y, FUZZ, STAT, NMISSX=NMISSX, NMISSY=NMISSY)
! Print the results
CALL UMACH (2, NOUT)
WRITE (NOUT,99999) (STAT(I),I=1,10), REAL(NMISSX), REAL(NMISSY)
!
99999 FORMAT (' Wilcoxon W statistic ........................', F5.1, &
/, ' 2*WBAR - W ..................................', &
F5.1, /, ' p-value .....................................' &
, F7.3, /, ' Adjusted Wilcoxon statistic ', '............' &
, '.....', F5.1, /, ' Adjusted 2*WBAR - W ', '...........', &
'..', '............', F5.1, /, ' Adjusted p-value ', &
'............................', F7.3, /, ' W statistic ', &
'for averaged ranks ..............', F5.1, /, ' Standard ' &
, 'error of W (averaged ranks) ........', F7.3, /, &
' Standard normal score of W (averaged ranks) .', F7.3, &
/, ' Two-sided p-value of W (averaged ranks) .....', &
F7.3, //, ' Number of missing for X .....................' &
, F5.1, /, ' Number of missing for Y ', '................' &
, '.....', F5.1)
!
END
Output
*** WARNING ERROR 5 from RNKSM. At least one tie is detected between the
*** samples.
Wilcoxon W statistic ........................ 34.0
2*WBAR - W .................................. 21.0
p-value ..................................... 0.110
Adjusted Wilcoxon statistic ................. 35.0
Adjusted 2*WBAR - W ......................... 20.0
Adjusted p-value ............................ 0.075
W statistic for averaged ranks .............. 34.5
Standard error of W (averaged ranks) ........ 4.758
Standard normal score of W (averaged ranks) . 1.471
Two-sided p-value of W (averaged ranks) ..... 0.141
Number of missing for X ..................... 0.0
Number of missing for Y ..................... 0.0