TETCC

Categorizes bivariate data and compute the tetrachoric correlation coefficient.

Required Arguments

NROW — The absolute value of NROW is the number of observations currently in X and Y. (Input)
NROW may be positive, zero, or negative. Negative NROW means delete the  ‑NROW observations in X and Y from the analysis. In the usual case, in which all of the data have already been categorized into counts in ICOUNT, NROW should be set to 0 and IDO set to 3.

X — Vector of length NROW containing the observations on one variable. (Input)

Y — Vector of length NROW containing the observations on the second variable. (Input)

HX — Constant used to categorize values of X. (Input)
See description of ICOUNT.

HY — Constant used to categorize values of Y. (Input)
See description of ICOUNT.

ICOUNT — 2 by 2 matrix containing counts. (Output, if IDO = 0 or 1; input/output, if
IDO = 2 or 3.)
The elements of ICOUNT are the numbers of observations satisfying the following relations:
ICOUNT(1, 1) : X(i) < HX and Y(i) < HY
ICOUNT(1, 2) : X(i) < HX and Y(i) HY
ICOUNT(2, 1) : X(i) HX and Y(i) < HY
ICOUNT(2, 2) : X(i) HX and Y(i) HY

NR — Number of real roots in the interval ( 1.0, 1.0) of the seventh-degree polynomial used to estimate the correlation coefficient. (Output)

R — Vector of length 7 containing in the first NR positions estimates of the correlation coefficient. (Output)

RS — Estimate of the standard error of the estimates of the correlation coefficient(s). (Output)

Optional Arguments

IDO — Processing option. (Input)
Default: IDO = 0.

 

IDO

Action

0

This is the only invocation of TETCC, and all the data are input at once in X and Y.

1

This is the first invocation of TETCC with this data, and additional calls will be made. Initialization and updating for the data in X and Y are performed.

2

This is an intermediate invocation of TETCC, and updating for the observations in X and Y is performed.

3

Updating for the observations in X and Y is performed, and the tetrachoric correlation coefficient is computed using the values in ICOUNT.

LDICOU — Leading dimension of ICOUNT exactly as specified in the dimension statement in the calling program. (Input)
Default: LDICOU = size (ICOUNT,1).

FORTRAN 90 Interface

Generic: CALL TETCC (NROW, X, Y, HX, HY, ICOUNT, NR, R, RS [])

Specific: The specific interface names are S_TETCC and D_TETCC.

FORTRAN 77 Interface

Single: CALL TETCC (IDO, NROW, X, Y, HX, HY, ICOUNT, LDICOU, NR, R, RS)

Double: The double precision name is DTETCC.

Description

Routine TETCC computes the tetrachoric correlation coefficient for a bivariate sample, using either the sample itself or a two by two table of counts of the data. The tetrachoric correlation coefficient is taken as the solution to the seventh-degree polynomial obtained from the first seven terms of the expansion given by Kendall and Stuart (1979, page 326).

The standard error estimate results from an approximate, asymptotic expression derived under the assumption of bivariate normality with zero correlation. The zero correlation assumption is not overly restrictive since most uses of this standard error would be in tests of zero correlation.

If all of the data is available, the Pearson product-moment correlation coefficient (which can be computed using routine CORVC) is a much better estimate for the population correlation coefficient than is the tetrachoric correlation coefficient. If the counts in ICOUNT are all that is available, call TETCC with IDO = 3 and NROW = 0.

Comments

1. Informational errors

Type

Code

Description

3

1

Fewer than 200 observations are used.

3

2

The polynomial used to estimate the correlation coefficient has more than one root in the interval ( 1.0, 1.0). It is probable that the numerical precision is not good enough to obtain an estimate.

4

4

The proportion of counts in a row or column is so close to one that the inverse normal cdf cannot be computed.

4

6

The polynomial used to estimate the correlation coefficient has no roots in the interval ( 1.0, 1.0). It is probable that the numerical precision is not good enough to obtain an estimate.

2. If data for X and Y are available, it is better to use the Pearson product moment correlation coefficient (as computed by routine CORVC, for example) than to use the tetrachoric correlation coefficient.

3. The tetrachoric correlation coefficient should be considered somewhat questionable if the sample size is less than 200, if the cutpoints HX and HY are not close to the medians, or if there are multiple roots of the estimating equation in the interval (1.0, 1.0). Also, the tetrachoric correlation coefficient is a better estimate of the true correlation coefficient if the true coefficient is large in absolute value.

Examples

Example 1

In the first example, the data are counts. The 374 in ICOUNT(1, 1) indicates that in the raw data there were 374 pairs having both values less than some cutoff point. The 186 in ICOUNT(1, 2) indicates that there were 186 pairs in the raw data for which the first value was less than its cutoff value and the second value was greater than or equal to its cutoff value.

 

USE UMACH_INT

USE TETCC_INT

 

IMPLICIT NONE

INTEGER I, ICOUNT(2,2), IDO, NOUT, NR, NROW

REAL HX, HY, R(7), RS, X(1), Y(1)

!

CALL UMACH (2, NOUT)

ICOUNT(1,1) = 374

ICOUNT(1,2) = 186

ICOUNT(2,1) = 167

ICOUNT(2,2) = 203

IDO = 3

NROW = 0

CALL TETCC (NROW, X, Y, HX, HY, ICOUNT, NR, R, RS, IDO=IDO)

WRITE (NOUT,99998) NR, (R(I),I=1,NR)

99998 FORMAT (' Number of roots (estimates) is ', I1, /, ' ', &

'Estimate(s) = ',7F10.5)

WRITE (NOUT,99999) RS

99999 FORMAT (' The estimated standard error is ', F10.5)

END

Output

 

Number of roots (estimates) is 1

Estimate(s) = 0.33511

The estimated standard error is 0.05255

Example 2

In this example, some artificial bivariate normal data are generated using IMSL routine RNMVN, and then, the tetrachoric correlation coefficient is computed. Since the mean (and median) of each variable is 0.0, the cutpoints HX and HY are set to 0.0.

 

USE IMSL_LIBRARIES

 

IMPLICIT NONE

INTEGER I, ICOUNT(2,2), IRANK, NOUT, NR, NROW

REAL COV(2,2), HX, HY, R(7), RS, RSIG(2,2), X(1000), &

XY(1000,2), Y(1000)

!

EQUIVALENCE (X, XY), (Y, XY(1,2))

!

CALL UMACH (2, NOUT)

! Generate random sample from

! bivariate normal with correlation

! of 0.5.

COV(1,1) = 1.0

COV(1,2) = 0.5

COV(2,1) = 0.5

COV(2,2) = 1.0

! Obtain the Cholesky factorization.

CALL CHFAC (COV, IRANK, RSIG)

! Initialize seed of random number

! generator.

CALL RNSET (123457)

CALL RNMVN (RSIG, XY)

!

NROW = 1000

HX = 0.0

HY = 0.0

CALL TETCC (NROW, X, Y, HX, HY, ICOUNT, NR, R, RS)

WRITE (NOUT,99997) ICOUNT

99997 FORMAT (' ICOUNT = ', 4I4)

WRITE (NOUT,99998) NR, (R(I),I=1,NR)

99998 FORMAT (' Number of roots (estimates) is ', I1, /, ' ', &

'Estimate(s) = ',7F10.5)

WRITE (NOUT,99999) RS

99999 FORMAT (' The estimated standard error is ',F10.5)

END

Output

 

ICOUNT = 326 163 171 340

Number of roots (estimates) is 1

Estimate(s) = 0.49824

The estimated standard error is 0.04968