kTrendsTest¶
Performs a k-sample trends test against ordered alternatives.
Synopsis¶
kTrendsTest (ni, y)
Required Arguments¶
- int
ni[](Input) - Array of length
nGroupscontaining the number of responses for each of thenGroupsgroups. - float
y[](Input) - Array of length
ni[0] + ... + ni[nGroups-1]that contains the responses for each of thenGroupsgroups.ymust be sorted by group, with theni[0]observations in group 1 coming first, theni[1]observations in group two coming second, and so on.
Return Value¶
Array of length 17 containing the test results.
i |
stat[i] |
|---|---|
| 0 | Test statistic (ties are randomized). |
| 1 | Conservative test statistic with ties counted in favor of the null hypothesis. |
| 2 | p-value associated with stat[0]. |
| 3 | p-value associated with stat[1]. |
| 4 | Continuity corrected stat[2]. |
| 5 | Continuity corrected stat [3]. |
| 6 | Expected mean of the statistic. |
| 7 | Expected kurtosis of the statistic. (The expected skewness is zero.) |
| 8 | Total sample size. |
| 9 | Coefficient of rank correlation based upon stat[0]. |
| 10 | Coefficient of rank correlation based upon stat[1]. |
| 11 | Total number of ties between samples. |
| 12 | The t-statistic associated with stat [2]. |
| 13 | The t-statistic associated with stat[3]. |
| 14 | The t-statistic associated with stat [4]. |
| 15 | The t-statistic associated with stat[5]. |
| 16 | Degrees of freedom for each t-statistic. |
Description¶
Function kTrendsTest performs a k-sample trends test against ordered
alternatives. The alternative to the null hypothesis of equality is that
\(F_1\)(X) < \(F_2\)(X) < … \(F_k\)(X),
where \(F_1\), \(F_2\), etc., are cumulative distribution functions,
and the operator < implies that the less than relationship holds for all
values of X. While the trends test used in kTrendsTest requires that
the background populations be continuous, ties occurring within a sample
have no effect on the test statistic or associated probabilities. Ties
between samples are important, however. Two methods for handling ties
between samples are used. These are:
- Ties are randomly split (
stat[0]). - Ties are counted in a manner that is unfavorable to the alternative
hypothesis (
stat[1]).
Computational Procedure¶
Consider the matrices
where \(X_{ki}\) is the i-th observation in the k-th population,
\(X_{mj}\) is the j-th observation in the m-th population, and each
matrix \(M^{km}\) is \(n_k\) by \(n_m\) where \(n_i\) =
ni[i]. Let \(S_{km}\) denote the sum of all elements in
\(M^{km}\). Then, stat[1] is computed as the sum over all elements
in \(S_{km}\), minus the expected value of this sum (computed as
when there are no ties and the distributions in all populations are equal).
In stat[0], ties are broken randomly, and the element in the summation
is taken as 2.0 or 0.0 depending upon the result of breaking the tie.
stat[2] and stat[3] are computed using the t distribution. The
probabilities reported are asymptotic approximations based upon the t
statistics in stat[12] and stat[13], which are computed as in
Jonckheere (1954, page 141).
Similarly, stat[4] and stat[5] give the probabilities for
stat[14] and stat[15], the continuity corrected versions of
stat[2] and stat[3]. The degrees of freedom for each t statistic
(stat[16]) are computed so as to make the t distribution selected as
close as possible to the actual distribution of the statistic (see
Jonckheere 1954, page 141).
stat[6], the variance of the test statistic stat[0], and
stat[7], the kurtosis of the test statistic, are computed as in
Jonckheere (1954, page 138). The coefficients of rank correlation in
stat[8] and stat[9] reduce to the Kendall \(\tau\) statistic
when there are just two groups.
Exact probabilities in small samples can be obtained from tables in Jonckheere (1954). Note, however, that the t approximation appears to be a good one.
Assumptions¶
- The \(X_{mi}\) for each sample are independently and identically distributed according to a single continuous distribution.
- The samples are independent.
Hypothesis tests¶
\(H_0\) : \(F_1\)(X) ≥ \(F_2\)(X) ≥ … ≥
\(F_k\)(X)
\(H_1\) : \(F_1\)(X) < \(F_2\)(X) < … <
\(F_k\)(X)
Reject if stat[2] (or stat[3], or stat[4] or stat[5],
depending upon the method used) is too large.
Example¶
The following example is taken from Jonckheere (1954, page 135). It involves four observations in four independent samples.
from numpy import *
from pyimsl.stat.kTrendsTest import kTrendsTest
from pyimsl.stat.writeMatrix import writeMatrix
ni = [4, 4, 4, 4]
fmt = "%9.5f"
rlabel = ["stat[0] - Test Statistic (random) ............",
"stat[1] - Test Statistic (null hypothesis) ...",
"stat[2] - p-value for stat[0] .................",
"stat[3] - p-value for stat[1] .................",
"stat[4] - Continuity corrected for stat[2] ....",
"stat[5] - Continuity corrected for stat[3] ....",
"stat[6] - Expected mean .......................",
"stat[7] - Expected kurtosis ...................",
"stat[8] - Total sample size ...................",
"stat[9] - Rank corr. coef. based on stat[0] ...",
"stat[10]- Rank corr. coef. based on stat[1] ...",
"stat[11]- Total number of ties ................",
"stat[12]- t-statistic associated w/stat[2] ....",
"stat[13]- t-statistic asscoiated w/stat[3] ....",
"stat[14]- t-statistic associated w/stat[4] ....",
"stat[15]- t-statistic asscoiated w/stat[5] ....",
"stat[16]- Degrees of freedom .................."]
y = array([19., 20., 60., 130., 21., 61., 80., 129.,
40., 99., 100., 149., 49., 110., 151., 160.])
stat = kTrendsTest(ni, y)
writeMatrix("stat", stat, writeFormat=fmt, rowLabels=rlabel, column=True)
Output¶
stat
stat[0] - Test Statistic (random) ............ 46.00000
stat[1] - Test Statistic (null hypothesis) ... 46.00000
stat[2] - p-value for stat[0] ................. 0.01483
stat[3] - p-value for stat[1] ................. 0.01483
stat[4] - Continuity corrected for stat[2] .... 0.01683
stat[5] - Continuity corrected for stat[3] .... 0.01683
stat[6] - Expected mean ....................... 458.66667
stat[7] - Expected kurtosis ................... -0.15365
stat[8] - Total sample size ................... 16.00000
stat[9] - Rank corr. coef. based on stat[0] ... 0.47917
stat[10]- Rank corr. coef. based on stat[1] ... 0.47917
stat[11]- Total number of ties ................ 0.00000
stat[12]- t-statistic associated w/stat[2] .... 2.26435
stat[13]- t-statistic asscoiated w/stat[3] .... 2.26435
stat[14]- t-statistic associated w/stat[4] .... 2.20839
stat[15]- t-statistic asscoiated w/stat[5] .... 2.20839
stat[16]- Degrees of freedom .................. 36.04963