CNL Stat : Nonparametric Statistics : wilcoxon_rank_sum
wilcoxon_rank_sum
Performs a Wilcoxon rank sum test for comparing the medians of two populations.
Synopsis
#include <imsls.h>
float imsls_f_wilcoxon_rank_sum (int nx, float x[],int ny, float y[], ..., 0)
The type double function is imsls_d_wilcoxon_rank_sum.
Required Arguments
int nx (Input)
Number of observations in the first sample.
float x[] (Input)
Array of length nx containing the first sample.
int ny (Input)
Number of observations in the second sample.
float y[] (Input)
Array of length ny containing the second sample.
Return Value
The two-sided p-value for the Wilcoxon rank sum statistic computed with average ranks used in the case of ties. The p-value is computed using either exact or approximate calculations depending upon the number of observations and the optional argument IMSLS_EXACT_P_VALUE.
Synopsis with Optional Arguments
#include <imsls.h>
float imsls_f_wilcoxon_rank_sum (int nx, float x[], int ny, float y[],
IMSLS_FUZZ, float fuzz,
IMSLS_N_MISSING_X, int *nmissx,
IMSLS_N_MISSING_Y, int *nmissy,
IMSLS_MANN_WHITNEY, float *mann_whitney,
IMSLS_EXACT_P_VALUES, float **p,
IMSLS_EXACT_P_VALUES_USER, float p[],
IMSLS_STAT, float **stat,
IMSLS_STAT_USER, float stat[],
0)
Optional Arguments
IMSLS_FUZZ, float fuzz (Input)
Nonnegative constant used to determine ties in computing ranks in the combined samples. A tie is declared when two observations in the combined sample are within fuzz of each other.
Default: fuzz = 100 × imsls_f_machine(4× max {|xi1||xj2|}
IMSLS_N_MISSING_X, int *nmissx (Output)
Pointer to a scalar for the number of missing observations detected in x.
IMSLS_N_MISSING_Y, int *nmissy (Output)
Pointer to a scalar for the number of missing observations detected in y.
IMSLS_MANN_WHITNEY, float *mann_whitney (Output)
Pointer to a scalar for the Mann-Whitney test statistic. Although the test statistics for the Mann-Whitney and Wilcoxon rank sum tests are computed differently, the p-values for these tests are equal since the Wilcoxon test statistic is a linear transformation of the Mann-Whitney test statistic.
IMSLS_EXACT_P_VALUES, float **p (Output)
Address of a pointer to an internally allocated array of length 3 containing the exact p-values according to the following table:
Row
p-values
0
The exact left-tailed p-value.
1
The exact right-tailed p-value.
2
The exact two-tailed p-value.
IMSLS_EXACT_P_VALUES_USER, float p[] (Output)
Storage for array p is provided by the user. See IMSLS_EXACT_P_VALUES.
IMSLS_STAT, float **stat (Output)
Address of a pointer to an internally allocated array of length 10 containing the following statistics:
Row
Statistics
0
Wilcoxon W statistic (the sum of the ranks of the x observations) adjusted for ties in such a manner that W is as small as possible.
1
2 × E(W) W, where E(W) is the expected value of W.
2
Probability of obtaining a statistic less than or equal to min{W, 2 × E(W) W}.
3
W statistic adjusted for ties in such a manner that W is as large as possible.
4
2 × E(W) W, where E(W) is the expected value of W, adjusted for ties in such a manner that W is as large as possible.
5
probability of obtaining a statistic less than or equal to min{W, 2 × E(W) W}, adjusted for ties in such a manner that W is as large as possible.
6
W statistic with average ranks used in case of ties.
7
Estimated standard error of stat [6] under the null hypothesis of no difference.
8
Standard normal score associated with stat [6].
9
Two-sided p-value associated with stat[8].
IMSLS_STAT_USER, float stat[] (Output)
Storage for array stat is provided by the user. See IMSLS_STAT.
Description
Function imsls_f_wilcoxon_rank_sum conducts the Wilcoxon rank sum test for identical population distribution functions. The Wilcoxon test and the Mann-Whitney U test are equivalent. If the difference between the two populations can be attributed solely to a difference in location, then the Wilcoxon test becomes a test of equality of the population means (or medians) and is the nonparametric equivalent of the two-sample t-test. Function imsls_f_wilcoxon_rank_sum obtains ranks in the combined sample after first eliminating missing values from the data. The rank sum statistic is then computed as the sum of the ranks in the x sample. Three methods for handling ties are used. (A tie is counted when two observations are within fuzz of each other.) Method 1 uses the largest possible rank for tied observations in the smallest sample, while Method 2 uses the smallest possible rank for these observations. Thus, the range of possible rank sums is obtained.
Method 3 uses the average rank of the tied observations for handling tied observations between samples. Asymptotic standard normal scores are computed for the W score (based on a variance that has been adjusted for ties) when average ranks are used (see Conover 1980, p. 217), and the probability associated with the two-sided alternative is computed.
The p-value returned in stat[9] is the two-sided p-value calculated using the normal approximation with the normal score returned in stat[8]. The p-value returned by this routine is either the approximate or exact two-sided p-value depending upon the number of observations and IMSLS_EXACT_P_VALUES. The exact two-sided p-value is returned when the optional argument IMSLS_EXACT_P_VALUES is used or when both nx and ny are 25 or less.
Hypothesis Tests
In each of the following tests, the first line gives the hypothesis (and its alternative) under the assumptions 1 to 3 below, while the second line gives the hypothesis when assumption 4 is also true. The rejection region is the same for both hypotheses and is given in terms of Method 3 for handling ties. Another output statistic should be used, (stat[0] or stat[3]), if another method for handling ties is desired.
Test
Null and Alternative Hypothesis
Action
1
H0:Pr(x < y) = 0.5 vs H1:Pr(x < y) 0.5 or
H0:E(x) = E(y) vs H1:E(x) E(y)
Reject if p_value is less than the user’s significance level of the test.
2
H0:Pr(x < y) 0.5 vs H1:Pr(x < y) > 0.5
or
H0:E(x) E(y) vs H1:E(x) < E(y)
Reject if stat[6] is too small or if p[0] is less than the user’s significance level of the test
3
H0:Pr(x < y) 0.5 vs H1:Pr(x < y) < 0.5
or
H0:E(x) E(y)) vs H1:E(x) > E(y)
Reject if stat[6] is too large or if p[1] is less than the user’s significance level of the test
Assumptions
1. Arguments x and y contain random samples from their respective populations.
2. All observations are mutually independent.
3. The measurement scale is at least ordinal (i.e., an ordering less than, greater than, or equal to exists among the observations).
4. If f(x) and g(y) are the distribution functions of x and y, then g(y) = f(x + c) for some constant c(i.e., the distribution of y is, at worst, a translation of the distribution of x).
The p-values are calculated using either the large-sample normal approximation or the exact probability calculations. This approximate calculation is usually considered adequate when the size of one or both samples is greater than 50. For smaller samples, the exact probability calculations returned by IMSLS_EXACT_P_VALUES are recommended.
Example
The following example is taken from Conover (1980, p. 224). It involves the mixing time of two mixing machines using a total of 10 batches of a certain kind of batter, five batches for each machine. The null hypothesis is not rejected at the 5-percent level of significance. The warning error is always printed when one or more ties are detected, unless printing for warning errors is turned off. See function imsls_error_options (Chapter 15, Utilities).
The statistics are output in the array stat.
 
#include <imsls.h>
#include <stdio.h>
 
int main()
{
int nx = 5;
int ny = 5;
float x[5] = {7.3, 6.9, 7.2, 7.8, 7.2};
float y[5] = {7.4, 6.8, 6.9, 6.7, 7.1};
float *stat, *p;
char *labels[10] = {
"Wilcoxon W statistic ......................",
"2*E(W) - W ................................",
"p-value ...................................",
"Adjusted Wilcoxon statistic ...............",
"Adjusted 2*E(W) - W .......................",
"Adjusted p-value ..........................",
"W statistics for averaged ranks............",
"Standard error of W (averaged ranks) ......",
"Standard normal score of W (averaged ranks)",
"Approximate Two-sided p-value of W ......"
};
 
imsls_f_wilcoxon_rank_sum(nx, x, ny, y,
IMSLS_EXACT_P_VALUES, &p,
IMSLS_STAT, &stat,
0);
 
imsls_f_write_matrix("statistics", 10, 1, stat,
IMSLS_ROW_LABELS, labels,
IMSLS_WRITE_FORMAT, "%7.3f",
0);
 
printf("Exact Left-Tailed p-value ................. %8.3f\n", p[0]);
printf("Exact Right-Tailed p-value ................ %8.3f\n", p[1]);
printf("Exact Two-sided p-value ................... %8.3f\n", p[2]);
}
Output
 
*** WARNING Error IMSLS_AT_LEAST_ONE_TIE from imsls_f_wilcoxon_rank_sum.
*** At least one tie is detected between the samples.
 
 
statistics
Wilcoxon W statistic ...................... 34.000
2*E(W) - W ................................ 21.000
p-value ................................... 0.110
Adjusted Wilcoxon statistic ............... 35.000
Adjusted 2*E(W) - W ....................... 20.000
Adjusted p-value .......................... 0.075
W statistics for averaged ranks............ 34.500
Standard error of W (averaged ranks) ...... 4.758
Standard normal score of W (averaged ranks) 1.471
Approximate Two-sided p-value of W ...... 0.141
Exact Left-Tailed p-value ................. 0.937
Exact Right-Tailed p-value ................ 0.079
Exact Two-sided p-value ................... 0.159
Warning Errors
IMSLS_AT_LEAST_ONE_TIE
At least one tie is detected between the samples.
Fatal Errors
IMSLS_ALL_X_Y_MISSING
Each element of x and/or y is a missing (NaN, Not a Number) value.