CTLLN

FNLStat : Categorical and Discrete Data Analysis : CTLLN

CTLLN

Required Arguments

Computes model estimates and associated statistics for a hierarchical log‑linear model.

Required Arguments

NCLVAL — Vector of length NCLVAR containing, in its i‑th element, the number of levels or categories of the i‑th classification variable. (Input)

TABLE — Vector of length NCLVAL(1) * NCLVAL(2) * … * NCLVAL(NCLVAR) containing the entries in the cells of the table to be fit. (Input)
See Comment 3 for comments on the ordering of the elements of TABLE.

NVEF — Vector of length NEF containing the number of classification variables associated with each effect. (Input)

INDEF — Vector of length NVEF(1) + … + NVEF(NEF) containing, in consecutive positions, the indices of the variables that are included in each effect. (Input)
The entries in INDEF are sequenced so that the first NVEF(1) elements contain the indices of the variables in effect 1, the next NVEF(2) elements of INDEF contain the indices of the variables in effect 2, etc. See Comment 4 for an example.

FIT — Vector of length NCLVAL(1) * NCLVAL(2) * … * NCLVAL(NCLVAR) containing the model estimates of the cell frequencies. (Input/Output)
On input, FIT contains the initial estimates of the cell counts. Structural zeros in the model are specified by setting the corresponding element of FIT to 0.0. All other elements of FIT may be set to 1.0 if no other estimate of the expected cell counts is available. On output, FIT contains the fitted table. See Comment 3 for the ordering of the elements of FIT. If an element of FIT is positive but the corresponding element in TABLE is zero, then the element is called a sampling zero. Sampling zeros may effect the number of parameters that can be estimated, but they will not effect the degrees of freedom in chi‑squared tests. See the Description section.

NCOEF — Number of regression coefficients in the model. (Output)

COEF — NCOEF by 4 matrix containing the estimated coefficients and associated statistics. (Output)
Dummy variables used in fitting the log‑linear model are generated using the IDUMMY = 3 option of routine GRGLM (see Chapter 2, “Regression”). For this option, the k‑th dummy variable for classification variable I is the (0, 1) indicator variable for the k‑th level of the classification variable minus the (0, 1) indicator variable for the NCLVAL(I)-th level of the classification variable.

Column	Statistic
1	Coefficient estimate
2	Estimated standard error of the estimated coefficient
3	Asymptotic normal score for testing that the coefficient is zero
4	p‑value associated with the normal score in column 3 (two‑sided alternative).

COV — NCOEF by NCOEF covariance matrix for the estimated parameters. (Output)

RESID — NCLVAL(1) * NCLVAL(2) * … * NCLVAL(NCLVAR) by 4 matrix containing residual statistics for each cell in the table. (Output)

Column	Statistics
1	Signed square root of the contribution to chi‑squared
2	Contribution to the likelihood ratio
3	Freeman‑Tukey deviate
4	Residual difference

STAT — Vector of length 4 containing output statistics for the model. (Output)

I	STAT(I)
1	Log‑likelihood.
2	Likelihood ratio statistic for testing the fit of the model.
3	Degrees of freedom in the likelihood ratio statistic. This statistic corrects for parameters that cannot be estimated because of sampling zeros.
4	p‑value corresponding to the likelihood ratio statistic.

Optional Arguments

NCLVAR — Number of classification variables. (Input)
A variable specifying a margin in the table is a classification variable. The first classification variable is named A, the second classification variable is named B, etc.
Default: NCLVAR = size (NCLVAL,1).

NEF — Number of effects in the model. (Input)
A marginal table is implied by each effect in the model. Lower‑order effects should not be included since their inclusion is automatic in the hierarchical models fit here (e.g., do not include effects A or B if effect AB is in the model).
Default: NEF = size (NVEF,1).

EPS — Convergence criterion. (Input)
Convergence is assumed when the maximum deviation between an observed and a fitted marginal total is less than EPS. EPS = 0.10 is a typical value.
Default: EPS = 0.10.

MAXIT — Maximum number of iterations. (Input)
MAXIT = 15 is a typical value.
Default: MAXIT = 30

TOL — Tolerance used in determining linear dependence in COV. (Input)
TOL = 100.0 AMACH(4) is a common choice. See the documentation for routine AMACH in Reference Material.
Default: TOL = 1.19e‑5 for single precision and 2.d–14 for double precision.

IPRINT — Printing option. (Input)
Default: IPRINT = 0.

IPRINT	Action
0	No printing is performed.
1	TABLE, FIT, RESID, COEF, COV, and STAT are printed.

LDCOEF — Leading dimension of COEF exactly as specified in the dimension statement in the calling program. (Input)
Default: LDCOEF = size (COEF,1).

LDCOV — Leading dimension of COV exactly as specified in the dimension statement in the calling program. (Input)
Default: LDCOV = size (COV,1).

LDRESI — Leading dimension of RESID exactly as specified in the dimension statement in the calling program. (Input)
Default: LDRESI = size (RESID,1).

FORTRAN 90 Interface

Generic: CALL CTLLN (NCLVAL, TABLE, NVEF, INDEF, FIT, NCOEF, COEF, COV, RESID, STAT [, …])

Specific: The specific interface names are S_CTLLN and D_CTLLN.

FORTRAN 77 Interface

Single: CALL CTLLN (NCLVAR, NCLVAL, TABLE, NEF, NVEF, INDEF, EPS, MAXIT, TOL, IPRINT, FIT, NCOEF, COEF, LDCOEF, COV, LDCOV, RESID, LDRESI, STAT)

Double: The double precision name is DCTLLN.

Description

Routine CTLLN computes statistics of interest for a hierarchical model in a log‑linear analysis of a multidimensional contingency table. Among the statistics computed are the expected cell values, cell residuals, the log‑linear parameters and their estimated variances and covariances, the log‑likelihood for the model (plus a constant), and a likelihood‑ratio test of the model (versus the alternative that the cell probabilities are free to vary, subject only to the marginal constraints). In addition, CTLLN can print and label all statistics that it computes.

Routine PRPFT is used to find the maximum likelihood estimates of the expected cell counts (FIT). These expected values are then used as input to routine CTPAR in order to compute estimates of the parameters in the model and their estimated covariances.

The matrix RESID contains various residuals that may be used in analyzing the model. These residuals are discussed in detail by Bishop, Feinberg, and Holland (1975, pages 136‑137), among others. Each is computed from the cell observed (oi) and expected (fitted, fi) values according to the following methods:

1. The signed square root of the contributions to X2 are computed as

2. The contributions to the likelihood ratio (G2) are computed as 2oi log(oi/fi)

3. Freeman‑Tukey deviates are computed as

4. The residual differences are computed as oi ‑ fi

The log‑likelihood STAT(1) is computed as

where n is the number of cells in the table. The likelihood ratio statistic for testing the fit of the model is computed as

which for large samples follows a chi‑squared distribution.

The number of degrees of freedom in G2 is computed as the number of cells in the table, excluding structural zeros, minus the number of parameters that could be estimated if there were no sampling zeros. When there are either structural or sampling zeros in the model, some parameters may not be estimable because they are infinite. Parameters that cannot be estimated due to structural zeros are not counted in the number of parameters estimated when computing the degrees of freedom for X2. Parameters that cannot be estimated because of sampling zeros are counted as estimated parameters when computing the degrees of freedom for X2.

To explain the calculation of degrees of freedom, note that extended maximum likelihood estimates may be written as

where

are coefficient vectors, and ρ → ∞. Routine CTLLN estimates the finite portion of the estimates,

The infinite portion,

ensures that the fitted values for zero marginal cells corresponding to a term in the hierarchical model have estimated expectation of zero. Thus, CTLLN fits the finite portion of extended maximum likelihood estimates where the extension is to ±∞. Because the Hessian elements corresponding to infinite parameters are zero, the Hessian is computed from a reduced likelihood in which cells leading to infinite estimates have been eliminated. The user is referred to Clarkson and Jennrich (1991) for details.

Comments

1. Workspace may be explicitly provided, if desired, by use of C2LLN/DC2LLN. The reference is:

CALL C2LLN (NCLVAR, NCLVAL, TABLE, NEF, NVEF, INDEF, EPS, MAXIT, TOL, IPRINT, FIT, NCOEF, COEF, LDCOEF, COV, LDCOV, RESID, LDRESI, STAT, AMAR, INDEX, NCVEF, IXEF, IINDEF, IA, INDCL, CLVAL, REG, X, D, XMIN, XMAX, COVWK, WK, IWK)

The additional arguments are as follows.

AMAR — Vector of length equal to the sum over all effects in the model (J = 1 to NEF) of the length of the marginal table required for the effect. The length of each marginal table is computed as the product of the number of class values for each classification variable in the effect (the product of the nonzero elements of NCLVAL(INDEF(I)) where I ranges from K(J) through K(J)+ NVEF(J) ‑ 1. Here, K(1) = 1 and K(J + 1) = K(J) + NVEF(J).)

INDX — Vector of length NEF.

NCVEF — Vector of length 2NCLVAR ‑ 1.

IXEF — Vector of length NCLVAR * 2NCLVAR−1.

IINDEF — Vector of length NVEF(1) + … + NVEF(NEF).

IA — Vector of length NCLVAR.

INDCL — Vector of length NCLVAR.

CLVAL — Vector of length NCLVAL(1) + … + NCLVAL(NCLVAR).

REG — Vector of length NCOEF + 1.

X — Vector of length NCOEF if there exists both structural and sampling zeros in TABLE; otherwise, it is of length NCLVAR.

D — Vector of length NCOEF + 1.

XMIN — Vector of length NCOEF.

XMAX — Vector of length NCOEF.

COVWK — Vector of length NCOEF2 if there exists both structural and sampling zeros in TABLE. Otherwise, COVWK is not referenced and can be dimensioned of length one.

WK — Vector of length max(g, NCOEF + 1) if IPRINT = 0; otherwise, WK is of length max(g, 6m, n) where m is the maximal element in NCLVAL, n is the length of TABLE, and g equals the maximum over all effects in the model (J = 1, NEF) of the length of the marginal table required for the effect. The length of the marginal table is computed as the product of the number of class values for each classification variable in the effect (the product of the nonzero elements of NCLVAL(INDEF(I)) where I ranges from K(J) through K(J) + NVEF(J) ‑ 1, where K(1) = 1 and K(J + 1) = K(J) + NVEF(J)).

IWK — Vector of length 2 * NCLVAR + z + 1 where z is the number of structural zeros in TABLE.

2. Informational errors

Type	Code	Description
3	1	The optimization algorithm did not converge to the desired accuracy within MAXIT iterations. Some of the estimated statistics may not be accurate.
3	5	The label for one or more of the tables exceeds the buffer limit.
3	11	The label for one or more effects exceeds the buffer limit.
4	2	LDCOEF or LDCOV is less than NCOEF.

3. The cells of the vectors TABLE and ZERO are sequenced so that the first variable cycles from 1 to NCLVAL(1) the slowest, the second variable cycles from 1 to NCLVAL(2) the next slowest, etc., up to the NCLVAR‑th variable, which cycles from 1 to NCLVAL(NCLVAR) the fastest.

Example: For NCLVAR = 3, NCLVAL(1) = 2, NCLVAL(2) = 3, and NCLVAL(3) = 2, the cells of table X(I, J, K) are entered into TABLE(1) through TABLE(12) in the following order: X(1, 1, 1), X(1, 1, 2), X(1, 2, 1), X(1, 2, 2), X(1, 3, 1), X(1, 3, 2), X(2, 1, 1), X(2, 1, 2), X(2, 2, 1), X(2, 2, 2), X(2, 3, 1), X(2, 3, 2). The elements of FIT are similarly sequenced.

4. INDEF is used to describe the marginal tables to be fit. For example, if NCLVAR = 3 and the first effect is to fit the marginal table for variables 1 and 3 and the second effect is to fit the marginal table for variable 2, then: NEF = 2, NVEF(1) = 2, and NVEF(2) = 1. Since the sum of the NVEF(I) is 3, then INDEF is a vector of length 3 with values: INDEF(1) = 1, INDEF(2) = 3, and INDEF(3) = 2.

Example

The example illustrates the use of CTLLN in a simple four‑way table in which the first three factors have two levels, and the fourth factor has three levels. The data, taken from Lee (1977), involve brand preference in different situations.

USE CTLLN_INT

IMPLICIT NONE

INTEGER IPRINT, LDCOEF, LDCOV, LDRESI, LTAB, MAXIT, NCLVAR

REAL EPS

PARAMETER (EPS=0.01, IPRINT=1, LDCOEF=10, LDCOV=10, LDRESI=24, &

LTAB=24, MAXIT=10, NCLVAR=4)

INTEGER INDEF(6), NCLVAL(NCLVAR), NCOEF, NEF, NVEF(3)

REAL COEF(LDCOEF,4), COV(LDCOV,LDCOV), FIT(LTAB), &

RESID(LDRESI,4), STAT(4), TABLE(LTAB)

DATA TABLE/19, 57, 29, 63, 29, 49, 27, 53, 23, 47, 33, 66, 47, &

55, 23, 50, 24, 37, 42, 68, 43, 52, 30, 42/

DATA NEF/3/, NVEF/2, 2, 2/, INDEF/2, 4, 1, 4, 2, 3/

DATA NCLVAL/3, 2, 2, 2/, FIT/24*1.0/

CALL CTLLN (NCLVAL, TABLE, NVEF, INDEF, FIT, NCOEF, COEF, &

COV, RESID, STAT, EPS=EPS, MAXIT=MAXIT, IPRINT=IPRINT)

END

Output

Fitted Model: (B*D, A*D, B*C)

Variable Number of Levels

1 A 3

2 B 2

3 C 2

4 D 2

Model Statistics

Log-likelihood 3.7906

Likelihood ratio 11.89

Degrees of freedom 14.0

P-value 0.6154

Coefficient Statistics

Standard Asymptotic

Coefficient Error Z-statistic P-value

1 intercept 3.6827 0.0333 110.66 0.0000

2 A(1) -0.0591 0.0475 -1.24 0.2341

3 A(2) 0.0278 0.0461 0.60 0.5562

4 B -0.0166 0.0331 -0.50 0.6242

5 C -0.0434 0.0319 -1.36 0.1943

6 D -0.2783 0.0329 -8.45 0.0000

7 A*D(1) -0.1016 0.0475 -2.14 0.0506

8 A*D(2) 0.0034 0.0461 0.07 0.9414

9 B*C -0.1438 0.0319 -4.51 0.0005

10 B*D -0.0684 0.0328 -2.09 0.0558

------------------------------

Table 1: C = 1 D = 1

B = 1 by A (column)

1 2 3

Observed 19.00 23.00 24.00

Fit 19.52 23.65 26.09

Root chi-square -0.12 -0.13 -0.41

Likelihood -1.03 -1.29 -4.02

Freeman-Tukey -0.06 -0.08 -0.37

Residual -0.52 -0.65 -2.09

B = 2 by A (column)

1 2 3

Observed 29.00 47.00 43.00

Fit 30.85 37.37 41.23

Root chi-square -0.33 1.57 0.28

Likelihood -3.58 21.54 3.62

Freeman-Tukey -0.29 1.52 0.31

Residual -1.85 9.63 1.77

------------------------------

Table 2: C = 1 D = 2

B = 1 by A (column)

1 2 3

Observed 57.00 47.00 37.00

Fit 47.85 46.99 42.89

Root chi-square 1.32 0.00 -0.90

Likelihood 19.95 0.03 -10.93

Freeman-Tukey 1.29 0.04 -0.89

Residual 9.15 0.01 -5.89

B = 2 by A (column)

1 2 3

Observed 49.00 55.00 52.00

Fit 57.52 56.48 51.56

Root chi-square -1.12 -0.20 0.06

Likelihood -15.70 -2.92 0.89

Freeman-Tukey -1.13 -0.16 0.10

Residual -8.52 -1.48 0.44

------------------------------

Table 3: C = 2 D = 1

B = 1 by A (column)

1 2 3

Observed 29.00 33.00 42.00

Fit 28.39 34.40 37.94

Root chi-square 0.11 -0.24 0.66

Likelihood 1.23 -2.73 8.53

Freeman-Tukey 0.16 -0.20 0.68

Residual 0.61 -1.40 4.06

B = 2 by A (column)

1 2 3

Observed 27.00 23.00 30.00

Fit 25.24 30.58 33.73

Root chi-square 0.35 -1.37 -0.64

Likelihood 3.64 -13.10 -7.04

Freeman-Tukey 0.39 -1.41 -0.61

Residual 1.76 -7.58 -3.73

------------------------------

Table 4: C = 2 D = 2

B = 1 by A (column)

1 2 3

Observed 63.00 66.00 68.00

Fit 69.58 68.32 62.37

Root chi-square -0.79 -0.28 0.71

Likelihood -12.51 -4.57 11.75

Freeman-Tukey -0.78 -0.25 0.73

Residual -6.58 -2.32 5.63

B = 2 by A (column)

1 2 3

Observed 53.00 50.00 42.00

Fit 47.06 46.21 42.18

Root chi-square 0.87 0.56 -0.03

Likelihood 12.61 7.88 -0.36

Freeman-Tukey 0.87 0.58 0.01

Residual 5.94 3.79 -0.18

Asymptotic Coefficient Covariance

1 2 3 4 5

1 1.1076E-03 9.7132E-05 -3.5887E-05 4.3244E-05 4.3786E-05

2 2.2562E-03 -1.1408E-03 -3.4043E-11 2.6829E-11

3 2.1232E-03 2.5675E-11 -5.1643E-11

4 1.0968E-03 1.4480E-04

5 1.0146E-03

6 7 8 9 10

1 2.9815E-04 1.3065E-04 -1.6147E-05 1.4480E-04 7.6307E-05

2 1.3065E-04 7.2117E-04 -4.0976E-04 6.2343E-11 -1.0681E-11

3 -1.6147E-05 -4.0976E-04 5.7437E-04 -4.9217E-11 -2.3482E-11

4 7.6307E-05 1.2601E-11 -4.1730E-11 4.3786E-05 2.8917E-04

5 -1.4272E-11 -5.5301E-11 4.2801E-11 4.5231E-06 -4.6962E-11

6 1.0851E-03 9.7132E-05 -3.5887E-05 -4.9749E-11 3.0847E-05

7 2.2562E-03 -1.1408E-03 5.9300E-11 -1.0361E-10

8 2.1232E-03 -2.4481E-11 2.9160E-11

9 1.0146E-03 1.1201E-11

10 1.0743E-03