PRPFT

FNLStat : Categorical and Discrete Data Analysis : PRPFT

PRPFT

Required Arguments

Performs iterative proportional fitting of a contingency table using a loglinear model.

Required Arguments

NCLVAL — Vector of length NCLVAR containing, in its i‑th element, the number of levels or categories of the i‑th classification variable. (Input)

TABLE — Vector of length NCLVAL(1) * NCLVAL (2) * … * NCLVAL(NCLVAR) containing the entries in the cells of the table to be fit. (Input)
See Comment 3 for comments on the ordering of the elements of TABLE.

NVEF — Vector of length NEF that contains the number of classification variables associated with each effect. (Input)

INDEF — Vector of length NVEF(1) + …+ NVEF(NEF) that contains, in consecutive positions, the indices of the variables that are included in each effect. (Input)
The entries in INDEF are sequenced so that the first NVEF(1) elements contain the indices of the variables in effect 1, the next NVEF(2) elements of INDEF contain the indices of the variables in effect 2, etc. See Comment 4 for an example.

FIT — Vector of length NCLVAL(1) * NCLVAL(2) * … * NCLVAL(NCLVAR). (Input/Output)
On input, FIT contains the initial estimates of the cell counts. Structural zeros in the model are specified by setting the corresponding element of FIT to 0.0. All other elements of FIT must be positive. 1.0 may be used if no other estimate of the cell counts is available. See Comment 3 for the ordering of the elements of FIT. On output, FIT contains the fitted table.

Optional Arguments

NCLVAR — Number of classification variables. (Input)
Default: NCLVAR = size (NCLVAL,1).

NEF — Number of effects in the model. (Input)
A marginal table is implied by each effect in the model. Lower order effects should not be included since their inclusion is automatic (e.g., do not include effects A or B if effect AB is in the model).
Default: NEF = size (NVEF,1).

EPS — Convergence criterion. (Input)
Convergence is assumed when the maximum deviation between an observed and a fitted marginal total is less than EPS. EPS = 0.10 is a typical value.
Default: EPS = .10.

MAXIT — Maximum number of iterations. (Input)
MAXIT = 15 is a typical value.
Default: MAXIT = 30.

FORTRAN 90 Interface

Generic: CALL PRPFT (NCLVAL, TABLE, NVEF, INDEF, FIT [, …])

Specific: The specific interface names are S_PRPFT and D_PRPFT.

FORTRAN 77 Interface

Single: CALL PRPFT (NCLVAR, NCLVAL, TABLE, NEF, NVEF, INDEF, EPS, MAXIT, FIT)

Double: The double precision name is DPRPFT.

Description

Routine PRPFT uses the iterative proportional‑fitting algorithm to fit a log‑linear hierarchical model to a contingency table. Structural zeros are allowed. A hierarchical model is a factorial model in which lower‑order terms are always present. Thus, in a three‑way table with classification variable names A, B, and C, the following models are all hierarchical models.

Many other hierarchical models exist for the three‑way table. Since all hierarchical models can be completely specified by the higher‑order interactions (the lower‑order interactions will always be present), no lower‑order effects are included in model specification.

Corresponding to each hierarchical interaction is a marginal table. Iterations in PRPFT proceed by fitting marginal tables successively until the desired precision is achieved.

A structural zero is a cell in the table that, by design or otherwise, can have no observations, i.e., the count for the cell must be zero. Structural zeros are specified by setting the corresponding element in FIT to zero on input. Routine PRPFT is best suited for tables with no structural zeros and in which the initial estimates input in FIT are all 1. The user should be aware that the algorithm may take (much) longer to converge when this is not the case.

Sampling zeros are cells that are not structural zeros, but for which no count is observed. Routine PRPFT requires the absence of sampling zeros in all marginal tables that are fit. One common way method of achieving this is to add a constant, often 0.5, to each cell prior to fitting the table.

Comments

1. Workspace may be explicitly provided, if desired, by use of P2PFT/DP2PFT. The reference is:

CALL P2PFT (NCLVAR, NCLVAL, TABLE, NEF, NVEF, INDEF, EPS, MAXIT, FIT, AMAR, INDEX, WK, IWK)

The additional arguments are as follows.

AMAR — Work vector with length equal to the sum from J = 1 to NEF of the product of the nonzero elements of NCLVAL(INDEF(I)) for I = 1 to NVEF(J).

INDEX — Work vector of length NEF.

WK — Work vector with length equal to the maximum over J = 1 to NEF of the product of the nonzero elements of NCLVAL(INDEF(I)), for I = 1 to NVEF(J).

IWK — Work vector of length 2 * NCLVAR.

2. Informational errors

Type	Code	Description
3	11	The algorithm did not converge to the desired accuracy within MAXIT iterations.
4	12	A marginal total for an effect is zero. Since FIT indicates this is not a structural zero, the algorithm will not converge properly. One way to proceed is to add a constant to all cells in the table.

3. The cells of the vectors TABLE and FIT are sequenced so that the first variable cycles from 1 to NCLVAL(1), which is the slowest, the second variable cycles from 1 to NCLVAL(2), which is the next slowest, etc., up to the NCLVAR‑th variable, which cycles from 1 to NCLVAL(NCLVAR) the fastest.

Example: For NCLVAR = 3, NCLVAL(1) = 2, NCLVAL(2) = 3, and NCLVAL(3) = 2, the cells of table X(I, J, K) are entered into TABLE(1) through TABLE(12) in the following order. X(1, 1, 1), X(1, 1, 2), X(1, 2, 1), X(1, 2, 2), X(1, 3, 1), X(1, 3, 2), X(2, 1, 1), X(2, 1, 2), X(2, 2, 1), X(2, 2, 2), X(2, 3, 1), X(2, 3, 2). The elements of FIT are similarly sequenced.

4. INDEF is used to describe the marginal tables to be fit. For example, if NCLVAR = 3 and the first effect is to fit the marginal table for variables 1 and 3 and the second effect is to fit the marginal table for variable 2, then: NEF = 2, NVEF(1) = 2, and NVEF(2) = 1.

Since the sum of the NVEF(I) is 3, then INDEF is a vector of length 3 with values. INDEF (1) = 1, INDEF(2) = 3, and INDEF(3) = 2.

5. Typically, MAXIT = 5 is sufficient. If PRPFT does not converge, try using double precision, increasing MAXIT, or using the values output in FIT as input for another call to PRPFT.

Example

The following example is taken from Bishop, Feinberg, and Holland (1975, page 87). The data are originally from Bartlett (1935). This example examines the survival of plants (factor A = factor 2) at different values for time of planting (factor C = factor 3) and length of cutting (factor B = factor 1). The sample size for each level of B and C is fixed at 240.

B
		1				2
		A				A
		1	2			1	2
C	1	156	84	C	1	84	156
	2	107	133		2	31	209

The model to be fit is given by:

where mijk is the cell expected value for levels i, j, and k of factors A, B, and C, respectively.

USE PRPFT_INT

USE UMACH_INT

IMPLICIT NONE

INTEGER NCLVAR, NEF

PARAMETER (NCLVAR=3, NEF=3)

INTEGER INDEF(6), MAXIT, NCLVAL(NCLVAR), NOUT, NVEF(NEF)

REAL EPS, FIT(8), TABLE(8)

DATA NCLVAL/2, 2, 2/, NVEF/2, 2, 2/

DATA INDEF/1, 2, 1, 3, 2, 3/, EPS/0.0001/, MAXIT/15/

DATA TABLE/156, 107, 84, 31, 84, 133, 156, 209/

DATA FIT/8*1.0/

CALL PRPFT (NCLVAL, TABLE, NVEF, INDEF, FIT, EPS=EPS, MAXIT=MAXIT)

CALL UMACH (2, NOUT)

WRITE (NOUT,99999) FIT

99999 FORMAT (' FIT =', 8F7.1)

END

Output

FIT = 161.1 101.9 78.9 36.1 78.9 138.1 161.1 203.9