kruskalWallisTest¶
Performs a Kruskal-Wallis test for identical population medians.
Synopsis¶
kruskalWallisTest (ni, y)
Required Arguments¶
- int
ni[]
(Input) - Array of length
nGroups
containing the number of responses for each of thenGroups
groups. - float
y[]
(Input) - Array of length
ni[
0]
+...
+ni[nGroups
-1]
that contains the responses for each of thenGroups
groups.y
must be sorted by group, with theni[0]
observations in group 1 coming first, theni[
1]
observations in group two coming second, and so on.
Return Value¶
Array of length 4 containing the Kruskal-Wallis statistics.
i |
stat[i] |
---|---|
0 | Kruskal-Wallis H statistic. |
1 | Asymptotic probability of a larger H under the null hypothesis of identical population medians. |
2 | H corrected for ties. |
3 | Asymptotic probability of a larger H (corrected for ties) under the null hypothesis of identical populations. |
Optional Arguments¶
fuzz
, float (Input)Constant used to determine ties in y
. If (after sorting) |y[i] – y[i + 1]
| is less than or equal to fuzz
, then a tie is counted. fuzz
must be nonnegative.
Description¶
The function kruskalWallisTest
generalizes the Wilcoxon two-sample test
computed by routine wilcoxonRankSum to more than
two populations. It computes a test statistic for testing that the
population distribution functions in each of K populations are identical.
Under appropriate assumptions, this is a nonparametric analogue of the
one-way analysis of variance. Since more than two samples are involved, the
alternative is taken as the analogue of the usual analysis of variance
alternative, namely that the populations are not identical.
The calculations proceed as follows: All observations are ranked regardless
of the population to which they belong. Average ranks are used for tied
observations (observations within fuzz
of each other). Missing
observations (observations equal to NaN, not a number) are not included in
the ranking. Let \(R_i\) denote the sum of the ranks in the i-th
population. The test statistic H is defined as:
where N is the total of the sample sizes, \(n_i\) is the number of observations in the i-th sample, and \(S^2\) is computed as the (bias corrected) sample variance of the \(R_i\).
The null hypothesis is rejected when stat[3]
(or stat[1]
) is less
than the significance level of the test. If the null hypothesis is rejected,
then the procedures given in Conover (1980, page 231) may be used for
multiple comparisons. The routine kruskalWallisTest
computes asymptotic
probabilities using the chi-squared distribution when the number of groups
is 6 or greater, and a Beta approximation (see Wallace 1959) when the number
of groups is 5 or less. Tables yielding exact probabilities in small samples
may be obtained from Owen (1962).
Example¶
The following example is taken from Conover (1980, page 231). The data represents the yields per acre of four different methods for raising corn. Since \(H=25.5\), the four methods are clearly different. The warning error is always printed when the Beta approximation is used, unless printing for warning errors is turned off.
from numpy import *
from pyimsl.stat.kruskalWallisTest import kruskalWallisTest
from pyimsl.stat.writeMatrix import writeMatrix
ni = [9, 10, 7, 8]
y = array([83., 91., 94., 89., 89., 96., 91., 92., 90., 91., 90.,
81., 83., 84., 83., 88., 91., 89., 84., 101., 100., 91.,
93., 96., 95., 94., 78., 82., 81., 77., 79., 81., 80.,
81.])
fuzz = .001
stat = []
rlabel = ["H (no ties) =",
"Prob (no ties) =",
"H (ties) =",
"Prob (ties) ="]
stat = kruskalWallisTest(ni, y, fuzz=fuzz)
writeMatrix(" ", stat, rowLabels=rlabel, column=True)
Output¶
***
*** Warning error issued from IMSL function kruskalWallisTest:
*** The chi-squared degrees of freedom are less than 5, so the Beta approximation is used.
***
H (no ties) = 25.46
Prob (no ties) = 0.00
H (ties) = 25.63
Prob (ties) = 0.00