kTrendsTest¶

Performs a k-sample trends test against ordered alternatives.

Synopsis¶

kTrendsTest (ni, y)

Required Arguments¶

int ni[] (Input): Array of length nGroups containing the number of responses for each of the nGroups groups.
float y[] (Input): Array of length ni[0] + ... + ni[nGroups-1] that contains the responses for each of the nGroups groups. y must be sorted by group, with the ni[0] observations in group 1 coming first, the ni[1] observations in group two coming second, and so on.

Return Value¶

Array of length 17 containing the test results.

`i`	`stat[i]`
0	Test statistic (ties are randomized).
1	Conservative test statistic with ties counted in favor of the null hypothesis.
2	p-value associated with `stat[0]`.
3	p-value associated with `stat[1]`.
4	Continuity corrected `stat[2]`.
5	Continuity corrected `stat` `[3]`.
6	Expected mean of the statistic.
7	Expected kurtosis of the statistic. (The expected skewness is zero.)
8	Total sample size.
9	Coefficient of rank correlation based upon `stat[0]`.
10	Coefficient of rank correlation based upon `stat[1]`.
11	Total number of ties between samples.
12	The t-statistic associated with `stat` `[2]`.
13	The t-statistic associated with `stat[3]`.
14	The t-statistic associated with `stat` `[4]`.
15	The t-statistic associated with `stat[5]`.
16	Degrees of freedom for each t-statistic.

Description¶

Function kTrendsTest performs a k-sample trends test against ordered alternatives. The alternative to the null hypothesis of equality is that $F_1$ (X) < $F_2$ (X) < … $F_k$ (X), where $F_1$ , $F_2$ , etc., are cumulative distribution functions, and the operator < implies that the less than relationship holds for all values of X. While the trends test used in kTrendsTest requires that the background populations be continuous, ties occurring within a sample have no effect on the test statistic or associated probabilities. Ties between samples are important, however. Two methods for handling ties between samples are used. These are:

Ties are randomly split (stat[0]).
Ties are counted in a manner that is unfavorable to the alternative hypothesis (stat[1]).

Computational Procedure¶

Consider the matrices

$\begin{split}M^{km} = \left(m_{ij}^{km}\right) = \begin{cases} 2 & \text{if } X_{ki} < X_{mj} \\ 0 & \text{otherwise} \end{cases}\end{split}$

where $X_{ki}$ is the i-th observation in the k-th population, $X_{mj}$ is the j-th observation in the m-th population, and each matrix $M^{km}$ is $n_k$ by $n_m$ where $n_i$ = ni[i]. Let $S_{km}$ denote the sum of all elements in $M^{km}$ . Then, stat[1] is computed as the sum over all elements in $S_{km}$ , minus the expected value of this sum (computed as

$\sum\nolimits_{k<m} n_k n_m$

when there are no ties and the distributions in all populations are equal). In stat[0], ties are broken randomly, and the element in the summation is taken as 2.0 or 0.0 depending upon the result of breaking the tie.

stat[2] and stat[3] are computed using the t distribution. The probabilities reported are asymptotic approximations based upon the t statistics in stat[12] and stat[13], which are computed as in Jonckheere (1954, page 141).

Similarly, stat[4] and stat[5] give the probabilities for stat[14] and stat[15], the continuity corrected versions of stat[2] and stat[3]. The degrees of freedom for each t statistic (stat[16]) are computed so as to make the t distribution selected as close as possible to the actual distribution of the statistic (see Jonckheere 1954, page 141).

stat[6], the variance of the test statistic stat[0], and stat[7], the kurtosis of the test statistic, are computed as in Jonckheere (1954, page 138). The coefficients of rank correlation in stat[8] and stat[9] reduce to the Kendall $\tau$ statistic when there are just two groups.

Exact probabilities in small samples can be obtained from tables in Jonckheere (1954). Note, however, that the t approximation appears to be a good one.

Assumptions¶

The $X_{mi}$ for each sample are independently and identically distributed according to a single continuous distribution.
The samples are independent.

Hypothesis tests¶

$H_0$ : $F_1$ (X) ≥ $F_2$ (X) ≥ … ≥ $F_k$ (X)

$H_1$ : $F_1$ (X) < $F_2$ (X) < … < $F_k$ (X)

Reject if stat[2] (or stat[3], or stat[4] or stat[5], depending upon the method used) is too large.

Example¶

The following example is taken from Jonckheere (1954, page 135). It involves four observations in four independent samples.

from numpy import *
from pyimsl.stat.kTrendsTest import kTrendsTest
from pyimsl.stat.writeMatrix import writeMatrix

ni = [4, 4, 4, 4]
fmt = "%9.5f"
rlabel = ["stat[0] - Test Statistic  (random) ............",
          "stat[1] - Test Statistic  (null hypothesis) ...",
          "stat[2] - p-value for stat[0] .................",
          "stat[3] - p-value for stat[1] .................",
          "stat[4] - Continuity corrected for stat[2] ....",
          "stat[5] - Continuity corrected for stat[3] ....",
          "stat[6] - Expected mean .......................",
          "stat[7] - Expected kurtosis ...................",
          "stat[8] - Total sample size ...................",
          "stat[9] - Rank corr. coef. based on stat[0] ...",
          "stat[10]- Rank corr. coef. based on stat[1] ...",
          "stat[11]- Total number of ties ................",
          "stat[12]- t-statistic associated w/stat[2] ....",
          "stat[13]- t-statistic asscoiated w/stat[3] ....",
          "stat[14]- t-statistic associated w/stat[4] ....",
          "stat[15]- t-statistic asscoiated w/stat[5] ....",
          "stat[16]- Degrees of freedom .................."]
y = array([19., 20., 60., 130., 21., 61., 80., 129.,
           40., 99., 100., 149., 49., 110., 151., 160.])

stat = kTrendsTest(ni, y)
writeMatrix("stat", stat, writeFormat=fmt, rowLabels=rlabel, column=True)

Output¶

 
                           stat
stat[0] - Test Statistic  (random) ............   46.00000
stat[1] - Test Statistic  (null hypothesis) ...   46.00000
stat[2] - p-value for stat[0] .................    0.01483
stat[3] - p-value for stat[1] .................    0.01483
stat[4] - Continuity corrected for stat[2] ....    0.01683
stat[5] - Continuity corrected for stat[3] ....    0.01683
stat[6] - Expected mean .......................  458.66667
stat[7] - Expected kurtosis ...................   -0.15365
stat[8] - Total sample size ...................   16.00000
stat[9] - Rank corr. coef. based on stat[0] ...    0.47917
stat[10]- Rank corr. coef. based on stat[1] ...    0.47917
stat[11]- Total number of ties ................    0.00000
stat[12]- t-statistic associated w/stat[2] ....    2.26435
stat[13]- t-statistic asscoiated w/stat[3] ....    2.26435
stat[14]- t-statistic associated w/stat[4] ....    2.20839
stat[15]- t-statistic asscoiated w/stat[5] ....    2.20839
stat[16]- Degrees of freedom ..................   36.04963