CTASC
Computes partial association statistics for log‑linear models in a multidimensional contingency table.
Required Arguments
NCLVAL — Vector of length NCLVAR containing, in its i‑th element, the number of levels or categories of the i‑th classification variable. (Input)
TABLE — Vector of length NCLVAL(1) * NCLVAL(2) * … * NCLVAL(NCLVAR) containing the entries in the cells of the table to be fit. (Input)
See Comment 3 for comments on the ordering of the elements of TABLE.
ZERO — Vector of length NCLVAL(1) * NCLVAL(2) * … * NCLVAL(NCLVAR) indicating structural zeros in TABLE. (Input)
ZERO has the same structure as TABLE. Structural zeros in the TABLE are specified by setting the corresponding element of ZERO to 0.0. All other elements of zero must be positive. If structural zeros do not exist in TABLE, TABLE and ZERO can share the same storage locations. See Comment 3 for the ordering of the elements of ZERO.
ASSOC — 2NCLVAR ‑ 1 by 4 matrix containing the partial association statistics for each effect in the model. (Output)
Col. |
Statistic |
1 |
Likelihood ratio partial association chi‑squared for testing that all parameters in the effect are zero against a model containing all interactions of the same order |
2 |
Degrees of freedom in chi‑squared in columns 1 and 4 |
3 |
p‑value for the chi‑squared statistic in column 1 |
4 |
Number of zeros (structural and sampling) in the marginal table of the effect |
The rows of ASSOC are ordered with main effects first, followed by two‑way interactions, followed by the three‑way interactions, etc., until the last row, which contains the single NCLVAR‑way interaction. Thus, if there are 3 classification variables, there would be 8 rows in ASSOC and the rows would contain the A, B, C, AB, AC, BC, and the ABC effects where A represents the first (in INDCL) classification variable, B represents the second classification variable, etc.
CHIHI — NCLVAR by 5 matrix containing chi‑squared statistics testing that all k and higher interactions are zero where k = 1, 2, …, NCLVAR. (Output)
In the following, k is the row number of the statistic where the row numbers are 1, 2, …, NCLVAR.
Col. |
Statistic |
1 |
Likelihood ratio chi‑squared statistic for testing that all interactions higher than k are zero against a model including all interactions of order k |
2 |
p‑value for the chi‑squared value in column 1 |
3 |
Degrees of freedom for chi‑squared in columns 1 and 4 |
4 |
Pearson chi‑squared corresponding to column 1 |
5 |
p‑value for the chi‑squared value in column 4 |
CHISIM — NCLVAR by 5 matrix containing chi‑squared statistics for testing that all k‑factor interactions are simultaneously zero where k = 1, …, NCLVAR. (Output)
In the following, k is the row number of the statistic where the row numbers are
1, 2, …, NCLVAR.
Col. |
Statistic |
1 |
Likelihood ratio chi‑squared statistic for testing that all k‑factor interactions are all simultaneously zero given the model in which all k‑way interactions are present |
2 |
p‑value for the chi‑squared value in column 1 |
3 |
Degrees of freedom for chi‑squared in columns 1 and 4 |
4 |
Pearson chi‑squared corresponding to column 1 |
5 |
p‑value for the chi‑squared value in column 4 |
Optional Arguments
NCLVAR — Number of classification variables. (Input)
A variable specifying a margin in the table is a classification variable. The first classification variable is named A, the second classification variable is named B, etc.
Default: NCLVAR = size (NCLVAL,1).
EPS — Convergence criterion. (Input)
Convergence is assumed when the maximum deviation between an observed and a fitted marginal total is less than EPS. EPS = 0.10 is a typical value.
Default: EPS = 0.10.
MAXIT — Maximum number of iterations. (Input)
MAXIT = 15 is a typical value. When there are structural zeros a larger value, say MAXIT = 100, should be used.
Default: MAXIT = 100.
IPRINT — Printing option. (Input)
Default: IPRINT = 0.
IPRINT |
Action |
0 |
No printing is performed. |
1 |
Printing of ASSOC, CHIHI, and CHISIM is performed. |
2 |
ASSOC, CHIHI, CHISIM, and TABLE are printed. |
LDASSO — Leading dimension of ASSOC exactly as specified in the dimension statement in the calling program. (Input)
Default: LDASSO = size (ASSOC,1).
LDCHIH — Leading dimension of CHIHI exactly as specified in the dimension statement in the calling program. (Input)
Default: LDCHIH = size (CHIHI,1).
LDCHIS — Leading dimension of CHISIM exactly as specified in the dimension statement in the calling program. (Input)
Default: LDCHIS = size (CHISIM,1).
FORTRAN 90 Interface
Generic: CALL CTASC (NCLVAL, TABLE, ZERO, ASSOC, CHIHI, CHISIM [, …])
Specific: The specific interface names are S_CTASC and D_CTASC.
FORTRAN 77 Interface
Single: CALL CTASC (NCLVAR, NCLVAL, TABLE, ZERO, EPS, MAXIT, IPRINT, ASSOC, LDASSO, CHIHI, LDCHIH, CHISIM, LDCHIS)
Double: The double precision name is DCTASC.
Description
Routine CTASC computes likelihood‑ratio and Pearson X2 tests of partial‑association for each effect in a hierarchical log‑linear model. Also computed are likelihood ratio and Pearson chi‑squared tests that all interactions above a given level are simultaneously zero. All of these tests are asymptotic in nature. All models are hierarchical so that all lower order interactions that may be composed from a higher order effect in the model are automatically included in the model. All models are fit via the iterative proportional fitting algorithm, which is implemented in routine PRPFT. The algorithm proceeds as follows:
1. The hierarchical model including all k‑factor interactions is fit with k = 0, …, m and m = NCLVAR. The k = 0 model corresponds to a constant probability in each cell in the table while the k = m model is the full model. For each value of k, the likelihood ratio chi‑squared statistic for testing that all interactions not included in the fitted model are all simultaneously zero (against the alternative that this is not the case) is computed as
where oi is the observed count in the i‑th cell, fi is the fitted count for the given model, and the summation is over all cells in the table. Also computed (for comparison, the two statistics are asymptotically equivalent) is the usual Pearson chi‑squared statistic,
2. Let gi = NCLVAL(i), and let
and assume that there are no structural zeros in the table. Then, the number of degrees of freedom in the chi‑squared statistic for testing that all k‑order interactions are simultaneously zero is the sum over all k‑th order interaction effects of the degrees of freedom for the effect. In the no structural zero case, the degrees of freedom for an effect may be computed as
where j indexes the factors in the effect. Denote the sum of these degrees of freedom at level k by sk, and let s0 = 1. Then, the degrees of freedom in the k‑th test is given by sk.
When more than one structural zero is present, the degrees of freedom in the chi‑squared statistics are computed by fitting a least‑squares model for the full hierarchical model in which all interactions are included. Routine RGIVN (see Chapter 2, “Regression”) is used in fitting the model. Cells with sampling (as opposed to structural) zeros are included (but only when degrees of freedom are computed) by using a cell count of 0.5. Observations corresponding to structural zeros are not included. (Note that a structural zero is a model restriction that requires that the estimated count for a cell be zero. A sampling zero occurs by chance.) The degrees of freedom for each effect are found by summing over the estimated parameters for the effect. Parameters that are linearly related to previous parameters in the model (as determined through RGIVN via input argument TOL where TOL is 100 * AMACH(4) are not estimated. When there is only one structural zero, degrees of freedom are computed as if there were no structural zeros except for the highest level interaction term, which is given one fewer degree of freedom.
Chi‑squared statistics for testing that all effects at a given level k are simultaneously zero (given a hierarchical model in which all effects above level k are absent) are computed as the difference between the chi‑squared statistics testing that all k and higher interactions are zero and that of k + 1. That is, for J = 1 and 4, and I = 1, 2, …, NCLVAR ‑ 1, then CHISIM(I, J) = CHIHI(I, J) - CHIHI(I + 1, J), and CHISIM(NCLVAR, J) = CHIHI(NCLVAR, J).
3. For each effect, a “partial association” likelihood ratio chi‑squared statistic may be used to test the hypothesis that all parameters in the effect are simultaneously zero, given a model in which all interactions at the same level (or lower) are present, and all higher level interactions are absent. The degrees of freedom for the effect are computed as in Step 2.
Comments
1. Workspace may be explicitly provided, if desired, by use of C2ASC/DC2ASC. The reference is:
CALL C2ASC (NCLVAR, NCLVAL, TABLE, ZERO, EPS, MAXIT, IPRINT, ASSOC, LDASSO, CHIHI, LDCHIH, CHISIM, LDCHIS, FITWK, NCVEF, IXEF, AMAR, INDX, WK, IWK, COVWK)
The additional arguments are as follows:
FITWK — Work vector of length 3 * NCLVAL(1) * … * NCLVAL(NCLVAR).
NCVEF — Work vector of length 2NCLVAR ‑ 1.
IXEF — Work vector of length NCLVAR * 2(NCLVAR‑1)
AMAR — Work vector of length n. In defining n, let q(k) be the sum of the sizes of all possible marginal tables with k effects. For example, q(2) is the sum over all possible two‑way interactions I and J of NCLVAL(I) * NCLVAL(J) and q(NCLVAR) is the product NCLVAL(1) * … * NCLVAL(NCLVAR). Then, n = max(q(k)), k = 1, …, NCLVAR.
INDX — Work vector of length m where m is the maximum number of interactions at any level. That is, m = max(BINOM(NCLVAR, I)), I = 1, …, NCLVAR, where BINOM(NCLVAR, I) is the binomial coefficient (see routine BINOM (IMSL MATH/LIBRARY Special Functions)).
WK — Work vector of length 3 * NCLVAL(1) * … * NCLVAL(NCLVAR) if there exists more than one structural zero in TABLE, and of length
NCLVAL(1) * … * NCLVAL(NCLVAR) otherwise.
IWK — Work vector of length 2 * NCLVAR.
COVWK — Work vector of length (NCLVAL(1) * … * NCLVAL(NCLVAR))2 if there exists more than one structural zero in TABLE. Otherwise, COVWK is not referenced and can be dimensioned of length one in the calling program. On output, COVWK contains the upper triangular matrix containing the R matrix from a QR decomposition of the matrix of regressors for the full log‑linear model.
2. Informational errors
Type |
Code |
Description |
3 |
1 |
The optimization algorithm did not converge to the desired accuracy, EPS, within MAXIT iterations. |
3 |
5 |
The label for one or more of the tables exceeds the buffer limit. |
3 |
11 |
The label for one or more effects exceeds the buffer limit. |
3. The cells of the vectors TABLE and ZERO are sequenced so that the first variable cycles from 1 to NCLVAL(1) the slowest, the second variable cycles from 1 to NCLVAL(2) the next slowest, etc., up to the NCLVAR‑th variable, which cycles from 1 to NCLVAL(NCLVAR) the fastest.
Example: For NCLVAR = 3, NCLVAL(1) = 2, NCLVAL(2) = 3, and NCLVAL(3) = 2, the cells of table X(I, J, K) are entered into TABLE(1) through TABLE(12) in the following order: X(1, 1, 1), X(1, 1, 2), X(1, 2, 1), X(1, 2, 2), X(1, 3, 1), X(1, 3, 2), X(2, 1, 1), X(2, 1, 2), X(2, 2, 1), X(2, 2, 2), X(2, 3, 1), X(2, 3, 2). The elements of FIT are similarly sequenced.
Programming Notes
1. When sampling zeros are present, the likelihood ratio test statistics may not follow the appropriate chi‑squared distribution closely. A common (but not necessarily the best) practice in this case is to add a small positive constant, often 0.5, to each cell in the table. This addition is easily accomplished via routine SADD (IMSL MATH/LIBRARY). The addition of such a constant should not affect the computed degrees of freedom.
2. When marginal totals of zero are obtained, the optimization algorithm may be slow to converge. In this case, increase the value of argument MAXIT.
Example
The following example illustrates the use of CTASC for model building in a four‑way table involving brand preference. The first three factors each have 2 levels, while the fourth factor has 3 levels. The data are originally from Lee (1977) and are printed in the output. A model with two‑way interaction effects AD, BC, and BD looks promising.
USE CTASC_INT
IMPLICIT NONE
INTEGER IPRINT, LDASSO, LDCHIH, LDCHIS, LTAB
REAL EPS
PARAMETER (EPS=0.01, IPRINT=2, LDASSO=15, LDCHIH=4, LDCHIS=4, &
LTAB=24)
!
INTEGER NCLVAL(4)
REAL ASSOC(LDASSO,4), CHIHI(LDCHIH,5), CHISIM(LDCHIS,5), &
TABLE(LTAB)
!
DATA TABLE/19, 57, 29, 63, 29, 49, 27, 53, 23, 47, 33, 66, 47, &
55, 23, 50, 24, 37, 42, 68, 43, 52, 30, 42/
DATA NCLVAL/3, 2, 2, 2/
!
CALL CTASC (NCLVAL, TABLE, TABLE, ASSOC, CHIHI, CHISIM, &
EPS=EPS, IPRINT=IPRINT)
!
END
Output
Variable Number of Levels
1 A 3
2 B 2
3 C 2
4 D 2
----------
Table 1: B = 1 C = 1
D (row) by A (column)
1 2 3
1 19.00 23.00 24.00
2 57.00 47.00 37.00
----------
Table 2: B = 1 C = 2
D (row) by A (column)
1 2 3
1 29.00 33.00 42.00
2 63.00 66.00 68.00
----------
Table 3: B = 2 C = 1
D (row) by A (column)
1 2 3
1 29.00 47.00 43.00
2 49.00 55.00 52.00
----------
Table 4: B = 2 C = 2
D (row) by A (column)
1 2 3
1 27.00 23.00 30.00
2 53.00 50.00 42.00
Partial Association Statistics
Omitted Degrees of Marginal
Effect Chi-Square Freedom P-value Zeros
A 0.50 2.0 0.7782 0.0
B 0.06 1.0 0.8010 0.0
C 1.92 1.0 0.1657 0.0
D 73.21 1.0 0.0000 0.0
A*B 0.22 2.0 0.8978 0.0
A*C 1.01 2.0 0.6050 0.0
A*D 6.10 2.0 0.0475 0.0
B*C 19.89 1.0 0.0000 0.0
B*D 3.74 1.0 0.0532 0.0
C*D 0.74 1.0 0.3898 0.0
A*B*C 4.57 2.0 0.1017 0.0
A*B*D 0.16 2.0 0.9223 0.0
A*C*D 1.38 2.0 0.5022 0.0
B*C*D 2.22 1.0 0.1361 0.0
A*B*C*D 0.74 2.0 0.6917 0.0
Chi-square statistics for testing that all k and higher interactions are
zero.
Likelihood Degrees of
k Ratio P-Value Freedom Pearson P-Value
1 118.63 0.0000 23.0 115.71 0.0000
2 42.93 0.0008 18.0 43.90 0.0006
3 9.85 0.3631 9.0 9.87 0.3611
4 0.74 0.6917 2.0 0.74 0.6915
Chi-square statistics for testing that all k-factor interactions are
simultaneously zero.
Likelihood Degrees of
k Ratio P-Value Freedom Pearson P-Value
1 75.70 0.0000 5.0 71.81 0.0000
2 33.08 0.0001 9.0 34.03 0.0001
3 9.11 0.2449 7.0 9.13 0.2433
4 0.74 0.6917 2.0 0.74 0.6915