kTrendsTest¶
Performs a k-sample trends test against ordered alternatives.
Synopsis¶
kTrendsTest (ni, y)
Required Arguments¶
- int
ni[]
(Input) - Array of length
nGroups
containing the number of responses for each of thenGroups
groups. - float
y[]
(Input) - Array of length
ni[0] + ... + ni[nGroups-1]
that contains the responses for each of thenGroups
groups.y
must be sorted by group, with theni[0]
observations in group 1 coming first, theni[1]
observations in group two coming second, and so on.
Return Value¶
Array of length 17 containing the test results.
i |
stat[i] |
---|---|
0 | Test statistic (ties are randomized). |
1 | Conservative test statistic with ties counted in favor of the null hypothesis. |
2 | p-value associated with stat[0] . |
3 | p-value associated with stat[1] . |
4 | Continuity corrected stat[2] . |
5 | Continuity corrected stat [3] . |
6 | Expected mean of the statistic. |
7 | Expected kurtosis of the statistic. (The expected skewness is zero.) |
8 | Total sample size. |
9 | Coefficient of rank correlation based upon stat[0] . |
10 | Coefficient of rank correlation based upon stat[1] . |
11 | Total number of ties between samples. |
12 | The t-statistic associated with stat [2] . |
13 | The t-statistic associated with stat[3] . |
14 | The t-statistic associated with stat [4] . |
15 | The t-statistic associated with stat[5] . |
16 | Degrees of freedom for each t-statistic. |
Description¶
Function kTrendsTest
performs a k-sample trends test against ordered
alternatives. The alternative to the null hypothesis of equality is that
\(F_1\)(X
) < \(F_2\)(X
) < … \(F_k\)(X
),
where \(F_1\), \(F_2\), etc., are cumulative distribution functions,
and the operator < implies that the less than relationship holds for all
values of X
. While the trends test used in kTrendsTest
requires that
the background populations be continuous, ties occurring within a sample
have no effect on the test statistic or associated probabilities. Ties
between samples are important, however. Two methods for handling ties
between samples are used. These are:
- Ties are randomly split (
stat[0]
). - Ties are counted in a manner that is unfavorable to the alternative
hypothesis (
stat[1]
).
Computational Procedure¶
Consider the matrices
where \(X_{ki}\) is the i-th observation in the k-th population,
\(X_{mj}\) is the j-th observation in the m-th population, and each
matrix \(M^{km}\) is \(n_k\) by \(n_m\) where \(n_i\) =
ni[i]
. Let \(S_{km}\) denote the sum of all elements in
\(M^{km}\). Then, stat[1]
is computed as the sum over all elements
in \(S_{km}\), minus the expected value of this sum (computed as
when there are no ties and the distributions in all populations are equal).
In stat[0]
, ties are broken randomly, and the element in the summation
is taken as 2.0 or 0.0 depending upon the result of breaking the tie.
stat[2]
and stat[3]
are computed using the t distribution. The
probabilities reported are asymptotic approximations based upon the t
statistics in stat[12]
and stat[13]
, which are computed as in
Jonckheere (1954, page 141).
Similarly, stat[4]
and stat[5]
give the probabilities for
stat[14]
and stat[15]
, the continuity corrected versions of
stat[2]
and stat[3]
. The degrees of freedom for each t statistic
(stat[16]
) are computed so as to make the t distribution selected as
close as possible to the actual distribution of the statistic (see
Jonckheere 1954, page 141).
stat[6]
, the variance of the test statistic stat[0]
, and
stat[7]
, the kurtosis of the test statistic, are computed as in
Jonckheere (1954, page 138). The coefficients of rank correlation in
stat[8]
and stat[9]
reduce to the Kendall \(\tau\) statistic
when there are just two groups.
Exact probabilities in small samples can be obtained from tables in Jonckheere (1954). Note, however, that the t approximation appears to be a good one.
Assumptions¶
- The \(X_{mi}\) for each sample are independently and identically distributed according to a single continuous distribution.
- The samples are independent.
Hypothesis tests¶
\(H_0\) : \(F_1\)(X
) ≥ \(F_2\)(X
) ≥ … ≥
\(F_k\)(X
)
\(H_1\) : \(F_1\)(X
) < \(F_2\)(X
) < … <
\(F_k\)(X
)
Reject if stat[2]
(or stat[3]
, or stat[4]
or stat[5]
,
depending upon the method used) is too large.
Example¶
The following example is taken from Jonckheere (1954, page 135). It involves four observations in four independent samples.
from numpy import *
from pyimsl.stat.kTrendsTest import kTrendsTest
from pyimsl.stat.writeMatrix import writeMatrix
ni = [4, 4, 4, 4]
fmt = "%9.5f"
rlabel = ["stat[0] - Test Statistic (random) ............",
"stat[1] - Test Statistic (null hypothesis) ...",
"stat[2] - p-value for stat[0] .................",
"stat[3] - p-value for stat[1] .................",
"stat[4] - Continuity corrected for stat[2] ....",
"stat[5] - Continuity corrected for stat[3] ....",
"stat[6] - Expected mean .......................",
"stat[7] - Expected kurtosis ...................",
"stat[8] - Total sample size ...................",
"stat[9] - Rank corr. coef. based on stat[0] ...",
"stat[10]- Rank corr. coef. based on stat[1] ...",
"stat[11]- Total number of ties ................",
"stat[12]- t-statistic associated w/stat[2] ....",
"stat[13]- t-statistic asscoiated w/stat[3] ....",
"stat[14]- t-statistic associated w/stat[4] ....",
"stat[15]- t-statistic asscoiated w/stat[5] ....",
"stat[16]- Degrees of freedom .................."]
y = array([19., 20., 60., 130., 21., 61., 80., 129.,
40., 99., 100., 149., 49., 110., 151., 160.])
stat = kTrendsTest(ni, y)
writeMatrix("stat", stat, writeFormat=fmt, rowLabels=rlabel, column=True)
Output¶
stat
stat[0] - Test Statistic (random) ............ 46.00000
stat[1] - Test Statistic (null hypothesis) ... 46.00000
stat[2] - p-value for stat[0] ................. 0.01483
stat[3] - p-value for stat[1] ................. 0.01483
stat[4] - Continuity corrected for stat[2] .... 0.01683
stat[5] - Continuity corrected for stat[3] .... 0.01683
stat[6] - Expected mean ....................... 458.66667
stat[7] - Expected kurtosis ................... -0.15365
stat[8] - Total sample size ................... 16.00000
stat[9] - Rank corr. coef. based on stat[0] ... 0.47917
stat[10]- Rank corr. coef. based on stat[1] ... 0.47917
stat[11]- Total number of ties ................ 0.00000
stat[12]- t-statistic associated w/stat[2] .... 2.26435
stat[13]- t-statistic asscoiated w/stat[3] .... 2.26435
stat[14]- t-statistic associated w/stat[4] .... 2.20839
stat[15]- t-statistic asscoiated w/stat[5] .... 2.20839
stat[16]- Degrees of freedom .................. 36.04963