exactEnumeration¶

Computes exact probabilities in a two-way contingency table using the total enumeration method.

Synopsis¶

exactEnumeration(table)

Required Arguments¶

float table[[]] (Input): Array of length nRows × nColumns containing the observed counts in the contingency table.

Return Value¶

The p-value for independence of rows and columns. The p-value represents the probability of a more extreme table where “extreme” is taken in the Neyman-Pearson sense. The p-value is “two-sided”.

Optional Arguments¶

probTable (Output): Probability of the observed table occurring, given that the null hypothesis of independent rows and columns is true.
pValue (Output): The p-value for independence of rows and columns. The p-value represents the probability of a more extreme table where “extreme” is taken in the Neyman-Pearson sense. The p-value is “two-sided”. The p-value is also returned in functional form (see “Return Value”). A table is more extreme if its probability (for fixed marginals) is less than or equal to probTable.
checkNumericalError (Output): Sum of the probabilities of all tables with the same marginal totals. Parameter check should have a value of 1.0. Deviation from 1.0 indicates numerical error.

Description¶

Function exactEnumeration computes exact probabilities for an r × c contingency table for fixed row and column marginals (a marginal is the number of counts in a row or column), where r = nRows and c = nColumns. Let $f_{ij}$ denote the count in row i and column j of a table, and let $f_{i\bullet}$ and $f_{\bullet j}$ denote the row and column marginals. Under the hypothesis of independence, the (conditional) probability of the fixed marginals of the observed table is given by

$P_f = \frac{\prod\limits_{i=1}^{r} f_{i\cdot}! \prod\limits_{j=1}^{c} f_{\cdot j}!} {f_{\cdot\cdot}! \prod\limits_{i=1}^{r} \prod\limits_{j=1}^{c} f_{ij}!}$

where $f_{\bullet\bullet }$ is the total number of counts in the table. $P_f$ corresponds to output argument probTable.

A “more extreme” table X is defined in the probabilistic sense as more extreme than the observed table if the conditional probability computed for table X (for the same marginal sums) is less than the conditional probability computed for the observed table. The user should note that this definition can be considered “two-sided” in the cell counts.

Because exactEnumeration used total enumeration in computing the probability of a more extreme table, the amount of computer time required increases very rapidly with the size of the table. Tables with a large total count $f_{\bullet\bullet }$ or a large value of r × c should not be analyzed using exactEnumeration. In such cases, try using exactNetwork.

Example¶

In this example, the exact conditional probability for the 2 × 2 contingency table

$\begin{split}\begin{bmatrix} 8 & 12 \\ 8 & 2 \\ \end{bmatrix}\end{split}$

is computed.

from __future__ import print_function
from numpy import *
from pyimsl.stat.exactEnumeration import exactEnumeration

table = array([[8, 12], [8, 2]])

p = exactEnumeration(table)

print("p-value = %9.4f" % p)

Output¶

p-value =    0.0577