CTEPR
Computes Fisher’s exact test probability and a hybrid approximation to the Fisher exact test probability for a contingency table using the network algorithm.
Required Arguments
TABLE — NROW by NCOL matrix containing the contingency table. (Input)
PRE — Table p‑value. (Output)
PRE is the probability of a more extreme table, where “extreme” is in a probabilistic sense. If EXPECT < 0, then the Fisher exact probability is returned. Otherwise, a hybrid approximation to Fisher’s exact probability is computed.
Optional Arguments
NROW — The number of rows in the table. (Input)
Default: NROW = size (TABLE,1).
NCOL — The number of columns in the table. (Input)
Default: NCOL = size (TABLE,2).
LDTABL — Leading dimension of TABLE exactly as specified in the dimension statement in the calling program. (Input)
Default: LDTABL = size (TABLE,1).
EXPECT — Expected value used in the hybrid approximation to Fisher’s exact test algorithm for deciding when to use asymptotic probabilities when computing path lengths. (Input)
Default: EXPECT = 5.0.
If EXPECT ≤ 0.0, then asymptotic theory probabilities are not used and Fisher exact test probabilities are computed. Otherwise, asymptotic probabilities are used in computing path lengths whenever PERCNT or more of the cells in the table for which path lengths are to be computed have estimated expected values of EXPECT or more, with no cell having expected value less than EMIN. See the “Description” section for details. Use EXPECT = 5.0 to obtain the “Cochran” condition.
PERCNT — Percentage of remaining cells that must have estimated expected values greater than EXPECT before asymptotic probabilities can be used in computing path lengths. (Input)
Default: PERCNT = 80.0.
See argument EXPECT for details. Use PERCNT = 80.0 to obtain the “Cochran” condition.
EMIN — Minimum cell estimated expected value allowed for asymptotic chi‑squared probabilities to be used. (Input)
Default: EMIN = 1.0.
See argument EXPECT for details. Use EMIN = 1.0 to obtain the “Cochran” condition.
PRT — Probability of the observed table for fixed marginal totals. (Output)
FORTRAN 90 Interface
Generic: CALL CTEPR (TABLE, PRE [, …])
Specific: The specific interface names are S_CTEPR and D_CTEPR.
FORTRAN 77 Interface
Single: CALL CTEPR (NROW, NCOL, TABLE, LDTABL, EXPECT, PERCNT, EMIN, PRT, PRE)
Double: The double precision name is DCTEPR.
Description
Routine CTEPR computes Fisher exact probabilities or a hybrid algorithm approximation to Fisher exact probabilities for a r × c contingency tables with fixed row and column marginals where r = NROW is the number of rows in the table and c = NCOL is the number of columns in the table. Let fij denote the frequency count in row i and column j of a table, and let fi∙ and f∙j denote the total row and column frequency count for row i and column j, respectively. Under the independence hypothesis, the (conditional) probability of the observed table for fixed row and column marginal totals is given by
where f∙∙ is the total number of counts in the table and x! denotes x factorial. When the fij are equal to the input table so that fij = TABLE (i, j), then let Po = PRT be the resulting value for Pf.
In CTEPR, a more extreme table is defined in the probabilistic sense. Table X is more extreme than the input table if the conditional probability computed for table X (for the same marginal sums) is less than the conditional probability computed for the input table. Let p = PRE be the probability of a more extreme table. Then
The user should note that this definition of “more extreme” can be considered as “two‑sided” in the cell counts.
Routine CTEPR uses the hybrid network algorithm of Mehta and Patel (1983, 1986a, 1986b) with the Clarkson and Fan (1989) modifications to compute the probability of a more extreme table. The hybrid algorithm uses asymptotic probabilities for tables encountered in which PERCNT percent of the table expected values are greater than or equal to EXPECT, and all expected values are greater than EMIN. When PERCNT = 80, EXPECT = 5, and EMIN = 1, this is the “Cochran” rule. Although the hybrid network algorithm can be orders of magnitude faster than the total enumeration algorithm used in routine CTPRB, the amount of computer time required by CTEPR still increases very rapidly with the size of the table. Caution should be used whenever computer time is a consideration.
Comments
1. Workspace may be explicitly provided, if desired, by use of C2EPR/DC2EPR. The reference is:
CALL C2EPR (NROW, NCOL, TABLE, LDTABL, EXPECT, PERCNT, EMIN, PRT, PRE, FACT, ICO, IRO, KYY, IDIF, IRN, KEY, LDKEY, IPOIN, STP, LDSTP, IFRQ, DLP, DSP, TM, KEY2, IWK, RWK)
The additional arguments are as follows:
FACT — Work vector of length NTOT + 1 where NTOT is the total count in the table.
ICO — Work vector of length MX where MX = max(NROW, NCOL).
IRO — Work vector of length MX.
KYY — Work vector of length MX.
IDIF — Work vector of length MN where MN = max(NROW, NCOL).
IRN — Work vector of length MN.
KEY — Work vector of length 2 * LDKEY.
LDKEY — Leading dimension of KEY exactly as specified in the dimension statement in the calling program. (Input)
IPOIN — Work vector of length 2 * LDKEY.
STP — Work vector of length 2 * LDSTP.
LDSTP — Leading dimension of STP exactly as specified in the dimension statement in the calling program. (Input)
IFRQ — Work vector of length 6 * LDSTP.
DLP — Work vector of length 2 * LDKEY.
DSP — Work vector of length 2 * LDKEY.
TM — Work vector of length 2 * LDKEY.
KEY2 — Work vector of length 2 * LDKEY.
IWK — Work vector of length max((NROW + NCOL + 1)(5 + 2 * MX), 800 + 7 * MX).
RWK — Work vector of length max(400 + MX + 1, NROW + NCOL + 1).
The exact value of LDKEY and LDSTP required is not known in advance. Common values to try are LDKEY = 1000 and LDSTP = 30000.
2. Informational errors
Type |
Code |
Description |
3 |
1 |
All of the elements of TABLE are zero. |
4 |
2 |
The product of the marginal totals is greater than can be exactly represented in an integer variable so the hash table key cannot be computed. The computations cannot proceed. |
4 |
3 |
LDKEY is too small. To increase LDKEY when invoking CTEPR/DCTEPR, increase the total workspace used. A doubling of the total workspace is a good place to begin. |
4 |
4 |
LDSTP is too small. To increase LDSTP when invoking CTEPR/DCTEPR, increase the total workspace used. A doubling of the total workspace is a good place to begin. |
4 |
5 |
The current value for IWKIN is too small. It is not possible to give the value for IWKIN required, but you might try doubling the amount. Refer to IWKIN in the Reference Material section. |
3. Routine CTEPR/DCTEPR will use all available workspace. It is not unusual for CTEPR/DCTEPR to require 200,000 floating‑point units of workspace.
4. When C2EPR/DC2EPR is called by CTEPR/DCTEPR, LDSTP = 30 * LDKEY.
5. Although not a restriction, it is not generally practical to call this routine with large tables that are not sparse and in which the hybrid approximation to Fisher’s exact test (see the Description section) has little effect. For example, although it is feasible to compute exact probabilities for the table
computing exact probabilities for a similar table that has been enlarged by the addition of an extra row (or column) may not be feasible.
Example
In this example, CTEPR is used to compute the hybrid approximation to the Fisher exact probability for a 3 × 6 contingency table using the Cochran condition. Because of the large initial counts and the input arguments EXPECT = 5, PERCNT = 80, and EMIN = 1, the hybrid algorithm significantly reduces the computation effort in this example. The input table is given as
USE UMACH_INT
USE CTEPR_INT
IMPLICIT NONE
INTEGER LDTABL, NCOL
PARAMETER (NCOL=5, LDTABL=3)
!
INTEGER NOUT
REAL PRE, PRT, TABLE(LDTABL,NCOL)
!
DATA TABLE/20.0, 10.0, 20.0, 20.0, 10.0, 20.0, 0.0, 2.0, 0.0, &
0.0, 2.0, 0.0, 0.0, 1.0, 0.0/
!
CALL UMACH (2, NOUT)
!
CALL CTEPR (TABLE, PRE, PRT=PRT)
!
WRITE (NOUT,99999) PRT, PRE
!
99999 FORMAT (' PRT = ', E12.4, ' PRE = ', F8.4)
!
END
Output
PRT = 0.1915E-04 PRE = 0.0601
For comparison, the usual asymptotic chi‑squared p‑value (which may be computed through the use of routine CTCHI, do not use CTEPR) is computed as 0.0323, and the Fisher exact probability (which may be computed through CTEPR by setting EXPECT = 0.0) is computed as 0.0598 and requires approximately ten times more computer time than the hybrid method. The Fisher exact probability and the usual asymptotic chi‑squared probability will often be quite different. When it may be used, the hybrid algorithm can lead to significantly greater savings in computer time.