Computes and test Kendall’s rank correlation coefficient.
Required Arguments
X — Vector of length NOBS containing the observations for the first variable. (Input)
Y — Vector of length NOBS containing the observations for the second variable. (Input)
FUZZ — Value used to determine ties in X or Y. (Input) Two observations are said to be tied if the absolute value of their difference is less than or equal to FUZZ.
STAT — Vector of length 9 containing some output statistics. (Output) See the “Description” section for full definitions. The output statistics are;
i
STAT(i)
1
Kendall a (assumes no ties)
2
Kendall b (corrects for ties)
3
Ties statistic for variable X
4
Ties statistic for variable Y
5
Statistic S corresponding to Kendall’s
6
Exact probability of achieving a score at least as large as S. S is not calculated if NOBS is too large (34 on many computers) or there are ties. In either case, STAT(6) is set to NaN (not a number).
7
The same probability as STAT(6) but using a normal approximation. (Set to NaN if NOBS is less than 8.
8
The same probability as STAT(6) but using a continuity correction with a normal approximation. (Set to NaN if NOBS is less than 8.)
9
Index in FRQ corresponding to the frequency of the observed S statistic. STAT(9) is not computed when there are ties.
Optional Arguments
NOBS — Number of observations. (Input) NOBS must be 3 or more. Default: NOBS = size (X,1).
FRQ — Vector of length NOBS* (NOBS‑ 1)/2 + 1 containing the frequencies of occurrence of the possible values of the statistic S, STAT(5), under the null hypothesis of no relationship. (Output) FRQ is not calculated if there are ties or if NOBS is too large (34 on many computers).
FORTRAN 90 Interface
Generic: CALLKENDL (X, Y, FUZZ, STAT[, …])
Specific: The specific interface names are S_KENDL and D_KENDL.
FORTRAN 77 Interface
Single: CALLKENDL (NOBS, X, Y, FUZZ, STAT, FRQ)
Double: The double precision name is DKENDL.
Description
Routine KENDL performs Kendall’s test of the hypothesis of no correlation (independence) by calculating a and b (b handles ties), the Kendall sum S, and associated probabilities. The frequencies of occurrence of S are also computed if the sample size (NOBS) is not too large.
Kendall’s (1962) method is used in computing the statistics. Each pair (xi, yi) is compared with every other pair (xj, yj). The Kendall S statistic is incremented if the two pairs are concordant ((xi > xj and yi > yj) or (xi < xj and yi < yj)) and decremented if the pairs are discordant ((xi > xj and yi < yj) or (xi < xj and yi > yj)). Ties (xi = xj or yi = yj) are not counted. Generally, when ties exist, b is a better measure of correlation than is a. The untied form of the denominator is used to calculate a. That is,
where n = NOBS. Ties enter into the denominator of b as follows:
where D = n(n‑ 1)/2 and
where ti is the number of ties in the x variable with the i-th tie value. Ty is calculated in a similar manner.
For NOBS less than 34 (on many machines other values on machines with a different value for the largest real number that can be represented), the array FRQ is computed. FRQ contains the frequency distribution of S under the null hypothesis of independence. The probability distribution of S can be obtained directly from these frequencies by dividing each frequency by the sum of the frequencies. See routine KENDP for further discussion on the use of the FRQ array.
For a two-sided test, if the appropriate probability p of achieving or exceeding S is small (less than α/2, where α is the significance level of the test) or if 1 ‑p is small (less than α/2), then the two-sided hypothesis of no correlation can be rejected. Alternatively, for small p or 1 ‑p, the appropriate one-sided hypothesis can be rejected.
For n > 7, asymptotic normal probabilities are determined using the fact that
is approximately standard normal for large n. Here,
where ti is the number of observations in the i-th tie group for the x (or y) summation variable.
STAT(7) contains the probability associated with the z statistic while STAT(8) contains the same probability but with the value of S reduced by 1. This reduction is for “continuity correction.” For n less than 25, these probabilities are conservative at the 1% level of significance.
Comments
1. Workspace may be explicitly provided, if desired, by use of K2NDL/DK2NDL. The reference is:
CALLK2NDL(NOBS, X, Y, FUZZ, STAT, FRQ, IWK, WK, XRNK, YRNK)
The additional arguments are as follows:
IWK — Work vector of length NOBS.
WK — Work vector of length (NOBS‑ 1) * (NOBS‑ 2)/2 + 1.
XRNK — Work vector of length NOBS.
YRNK — Work vector of length NOBS.
2. Informational errors
Type
Code
Description
3
4
Ties are detected in the two samples. STAT(6) is set to NaN (not a number) and FREQ is not calculated.
3
5
NOBS is less than 8 so the asymptotic normal probabilities are not determined. STAT(7) and STAT(8) are set to NaN (not a number).
3
6
NOBS is too large (34 on many computers). STAT(6) is set to NaN (not a number) and FREQ is not calculated.
4
2
All the elements of X are tied. The output statistics are not defined.
4
3
All the elements of Y are tied. The output statistics are not defined.
Example
In this example, the Kendall test is performed on a sample of size 8. The test fails to reject the null hypothesis of no correlation.
! SPECIFICATIONS FOR PARAMETERS
USE KENDL_INT
USE WRRRL_INT
USE WRRRN_INT
IMPLICIT NONE
REAL FUZZ
PARAMETER (FUZZ=0.0001)
!
REAL FRQ(29), STAT(9), X(8), Y(8)
CHARACTER CLABEL(2)*10, RLABEL(9)*10
!
DATA RLABEL/'tau(a)', 'tau(b)', 'ties(X)', 'ties(Y)', &