TETCC
Categorizes bivariate data and compute the tetrachoric correlation coefficient.
Required Arguments
NROW — The absolute value of NROW is the number of observations currently in X and Y. (Input)
NROW may be positive, zero, or negative. Negative NROW means delete the ‑NROW observations in X and Y from the analysis. In the usual case, in which all of the data have already been categorized into counts in ICOUNT, NROW should be set to 0 and IDO set to 3.
X — Vector of length ∣NROW∣ containing the observations on one variable. (Input)
Y — Vector of length ∣NROW∣ containing the observations on the second variable. (Input)
HX — Constant used to categorize values of X. (Input)
See description of ICOUNT.
HY — Constant used to categorize values of Y. (Input)
See description of ICOUNT.
ICOUNT — 2 by 2 matrix containing counts. (Output, if IDO = 0 or 1; input/output, if
IDO = 2 or 3.)
The elements of ICOUNT are the numbers of observations satisfying the following relations:
ICOUNT(1, 1) : X(i) < HX and Y(i) < HY
ICOUNT(1, 2) : X(i) < HX and Y(i) ≥ HY
ICOUNT(2, 1) : X(i) ≥ HX and Y(i) < HY
ICOUNT(2, 2) : X(i) ≥ HX and Y(i) ≥ HY
NR — Number of real roots in the interval ( ‑1.0, 1.0) of the seventh-degree polynomial used to estimate the correlation coefficient. (Output)
R — Vector of length 7 containing in the first NR positions estimates of the correlation coefficient. (Output)
RS — Estimate of the standard error of the estimates of the correlation coefficient(s). (Output)
Optional Arguments
IDO — Processing option. (Input)
Default: IDO = 0.
IDO |
Action |
0 |
This is the only invocation of TETCC, and all the data are input at once in X and Y. |
1 |
This is the first invocation of TETCC with this data, and additional calls will be made. Initialization and updating for the data in X and Y are performed. |
2 |
This is an intermediate invocation of TETCC, and updating for the observations in X and Y is performed. |
3 |
Updating for the observations in X and Y is performed, and the tetrachoric correlation coefficient is computed using the values in ICOUNT. |
LDICOU — Leading dimension of ICOUNT exactly as specified in the dimension statement in the calling program. (Input)
Default: LDICOU = size (ICOUNT,1).
FORTRAN 90 Interface
Generic: CALL TETCC (NROW, X, Y, HX, HY, ICOUNT, NR, R, RS [, …])
Specific: The specific interface names are S_TETCC and D_TETCC.
FORTRAN 77 Interface
Single: CALL TETCC (IDO, NROW, X, Y, HX, HY, ICOUNT, LDICOU, NR, R, RS)
Double: The double precision name is DTETCC.
Description
Routine TETCC computes the tetrachoric correlation coefficient for a bivariate sample, using either the sample itself or a two by two table of counts of the data. The tetrachoric correlation coefficient is taken as the solution to the seventh-degree polynomial obtained from the first seven terms of the expansion given by Kendall and Stuart (1979, page 326).
The standard error estimate results from an approximate, asymptotic expression derived under the assumption of bivariate normality with zero correlation. The zero correlation assumption is not overly restrictive since most uses of this standard error would be in tests of zero correlation.
If all of the data is available, the Pearson product-moment correlation coefficient (which can be computed using routine CORVC) is a much better estimate for the population correlation coefficient than is the tetrachoric correlation coefficient. If the counts in ICOUNT are all that is available, call TETCC with IDO = 3 and NROW = 0.
Comments
1. Informational errors
Type |
Code |
Description |
3 |
1 |
Fewer than 200 observations are used. |
3 |
2 |
The polynomial used to estimate the correlation coefficient has more than one root in the interval ( ‑1.0, 1.0). It is probable that the numerical precision is not good enough to obtain an estimate. |
4 |
4 |
The proportion of counts in a row or column is so close to one that the inverse normal cdf cannot be computed. |
4 |
6 |
The polynomial used to estimate the correlation coefficient has no roots in the interval ( ‑1.0, 1.0). It is probable that the numerical precision is not good enough to obtain an estimate. |
2. If data for X and Y are available, it is better to use the Pearson product moment correlation coefficient (as computed by routine CORVC, for example) than to use the tetrachoric correlation coefficient.
3. The tetrachoric correlation coefficient should be considered somewhat questionable if the sample size is less than 200, if the cutpoints HX and HY are not close to the medians, or if there are multiple roots of the estimating equation in the interval (‑1.0, 1.0). Also, the tetrachoric correlation coefficient is a better estimate of the true correlation coefficient if the true coefficient is large in absolute value.
Examples
Example 1
In the first example, the data are counts. The 374 in ICOUNT(1, 1) indicates that in the raw data there were 374 pairs having both values less than some cutoff point. The 186 in ICOUNT(1, 2) indicates that there were 186 pairs in the raw data for which the first value was less than its cutoff value and the second value was greater than or equal to its cutoff value.
USE UMACH_INT
USE TETCC_INT
IMPLICIT NONE
INTEGER I, ICOUNT(2,2), IDO, NOUT, NR, NROW
REAL HX, HY, R(7), RS, X(1), Y(1)
!
CALL UMACH (2, NOUT)
ICOUNT(1,1) = 374
ICOUNT(1,2) = 186
ICOUNT(2,1) = 167
ICOUNT(2,2) = 203
IDO = 3
NROW = 0
CALL TETCC (NROW, X, Y, HX, HY, ICOUNT, NR, R, RS, IDO=IDO)
WRITE (NOUT,99998) NR, (R(I),I=1,NR)
99998 FORMAT (' Number of roots (estimates) is ', I1, /, ' ', &
'Estimate(s) = ',7F10.5)
WRITE (NOUT,99999) RS
99999 FORMAT (' The estimated standard error is ', F10.5)
END
Output
Number of roots (estimates) is 1
Estimate(s) = 0.33511
The estimated standard error is 0.05255
Example 2
In this example, some artificial bivariate normal data are generated using IMSL routine RNMVN, and then, the tetrachoric correlation coefficient is computed. Since the mean (and median) of each variable is 0.0, the cutpoints HX and HY are set to 0.0.
USE IMSL_LIBRARIES
IMPLICIT NONE
INTEGER I, ICOUNT(2,2), IRANK, NOUT, NR, NROW
REAL COV(2,2), HX, HY, R(7), RS, RSIG(2,2), X(1000), &
XY(1000,2), Y(1000)
!
EQUIVALENCE (X, XY), (Y, XY(1,2))
!
CALL UMACH (2, NOUT)
! Generate random sample from
! bivariate normal with correlation
! of 0.5.
COV(1,1) = 1.0
COV(1,2) = 0.5
COV(2,1) = 0.5
COV(2,2) = 1.0
! Obtain the Cholesky factorization.
CALL CHFAC (COV, IRANK, RSIG)
! Initialize seed of random number
! generator.
CALL RNSET (123457)
CALL RNMVN (RSIG, XY)
!
NROW = 1000
HX = 0.0
HY = 0.0
CALL TETCC (NROW, X, Y, HX, HY, ICOUNT, NR, R, RS)
WRITE (NOUT,99997) ICOUNT
99997 FORMAT (' ICOUNT = ', 4I4)
WRITE (NOUT,99998) NR, (R(I),I=1,NR)
99998 FORMAT (' Number of roots (estimates) is ', I1, /, ' ', &
'Estimate(s) = ',7F10.5)
WRITE (NOUT,99999) RS
99999 FORMAT (' The estimated standard error is ',F10.5)
END
Output
ICOUNT = 326 163 171 340
Number of roots (estimates) is 1
Estimate(s) = 0.49824
The estimated standard error is 0.04968