hypothesisTest

Performs tests for a multivariate general linear hypothesis \(H\beta U=G\) given the hypothesis sums of squares and crossproducts matrix \(S_H\).

Synopsis

hypothesisTest (regressionInfo, dfh, scph)

Required Argument

structure regressionInfo (Input)
A structure containing information about the regression fit. See function regression.
float dfh (Input)
Degrees of freedom for the sums of squares and crossproducts matrix.
float scph[[]] (Input)
Array of size nu by nu containing \(S_H\), the sums of squares and crossproducts attributable to the hypothesis.

Return Value

The p-value corresponding to Wilks’ lambda test.

Optional Arguments

u, float[] (Input)

Argument u contains the nDependent by nu U matrix for the test \(H_p\beta U=G_p\); where nu is the number of linear combinations of the dependent variables to be considered. The value nu must be greater than 0 and less than or equal to nDependent.

Default: nu = nDependent and u is the identity matrix

wilkLambda, value, pValue (Output)
Wilk’s lamda and p-value.
royMaxRoot, value, pValue (Output)
Roy’s maximum root criterion and p-value.
hotellingTrace, value, pValue (Output)
Hotelling’s trace and p-value.
pillaiTrace, value, pValue (Output)
Pillai’s trace and p-value.

Description

Function hypothesisTest computes test statistics and p-values for the general linear hypothesis \(H\beta U=G\) for the multivariate general linear model.

The hypothesis sum of squares and crossproducts matrix input in scph is

\[S_H = \left(H\hat{\beta}U - G\right)^T \left(C^TDC\right)^- \left(H\hat{\beta}U-G\right)\]

where C is a solution to \(R^TC=H,(C^TDC)^-\) denotes the generalized inverse of \(C^TDC\), and D is a diagonal matrix with diagonal elements

\[\begin{split}d_{ii} = \begin{cases} 1 & \text{if } r_{ii} > 0 \\ 0 & \text{otherwise} \\ \end{cases}\end{split}\]

For a detailed discussion, see Linear Dependence and the R Matrix in the Usage Notes.

The error sum of squares and crossproducts matrix for the model \(Y=X\beta+\varepsilon\) is

\[\left(Y - X \hat{\beta}\right)^T \left(Y - X \hat{\beta}\right)\]

which is input in regressionInfo. The error sum of squares and crossproducts matrix for the hypothesis \(H\beta U=G\) computed by hypothesisTest is

\[S_E = U^T \left(Y - X\hat{\beta}\right)^T \left(Y - X \hat{\beta}\right) U\]

Let p equal the order of the matrices \(S_E\) and \(S_H\), i.e.,

\[\begin{split}p = \begin{cases} \mathrm{nu} & \text{if } \mathrm{nu} > 0 \\ \mathit{nDependent} & \text{otherwise} \end{cases}\end{split}\]

Let q (stored in dfh) be the degrees of freedom for the hypothesis. Let v (input in regressionInfo) be the degrees of freedom for error. Function hypothesisTest computed three test statistics based on eigenvalues \(\lambda_i\) (\(i=1,2,\ldots,p\)) of the generalized eigenvalue problem \(S_Hx=\lambda S_Ex\). These test statistics are as follows:

Wilk’s lambda

\[\mathit{\Lambda} = \frac{\det\left(S_E\right)}{\det\left(S_H + S_E\right)} = \prod_{i=1}^{p} \frac{1}{1 + \lambda_i}\]

The associated p-value is based on an approximation discussed by Rao (1973, p. 556). The statistic

\[F = \frac{ms - pq/2+1}{pq} \frac{1 - \mathit{\Lambda}^{1/s}}{\mathit{\Lambda}^{1/s}}\]

has an approximate F distribution with pq and \(ms-pq/2+1\) numerator and denominator degrees of freedom, respectively, where

\[\begin{split}s = \begin{cases} 1 & \text{if } p = 1 \text{ or } q = 1 \\ \sqrt{\frac{p^2 q^2 - 4}{p^2 + q^2 - 5}} & \text{otherwise} \end{cases}\end{split}\]

and

\[m = \upsilon - \frac{(p+q-1)}{2}\]

The F test is exact if \(\min(p,q)\leq 2\) (Kshirsagar, 1972, Theorem 4, p. 299−300).

Roy’s maximum root

\[c = \max λ_i \phantom{...} \text{ over all } i\]

where c is output as value. The p-value is based on the approximation

\[F = \frac{\upsilon + q - s}{s} c\]

where \(s=\max(p,q)\) has an approximate F distribution with s and \(ν + q − s\) numerator and denominator degrees of freedom, respectively. The F test is exact if \(s=1\); the p-value is also exact. In general, the value output in pValue is lower bound on the actual p-value.

Hotelling’s trace

\[U = tr\left(HE^{-1}\right) = \sum_{i=1}^{p} \lambda_i\]

U is output as value. The p-value is based on the approximation of McKeon (1974) that supersedes the approximation of Hughes and Saw (1972). McKeon’s approximation is also discussed by Seber (1984, p. 39). For

\[b = 4 + \frac{pq+2}{\frac{(\upsilon+q-p-1)(\upsilon-1)}{(\upsilon-p-3)(\upsilon-p)}}\]

the p-value is based on the result that

\[F = \frac{b(\upsilon-p-1)}{(b-2)pq}U\]

has an approximate F distribution with pq and b degrees of freedom. The test is exact if \(\min(p,q)=1\). For \(v\leq p+1\), the approximation is not valid, and pValue is set to NaN.

These three test statistics are valid when \(S_E\) is positive definite. A necessary condition for \(S_E\) to be positive definite is \(\nu\geq p\). If \(S_E\) is not positive definite, a warning error message is issued, and both value and pValue are set to NaN.

Because the requirement ν ≥ p can be a serious drawback, hypothesisTest computes a fourth test statistic based on eigenvalues \(\theta_i\) (\(i= 1,2,\ldots,p\)) of the generalized eigenvalue problem \(S_Hw=\theta(S_H+ S_E)w\). This test statistic requires a less restrictive assumption— \(S_H+S_E\) is positive definite. A necessary condition for \(S_H +S_E\) to be positive definite is \(\nu+q\geq p\). If \(S_E\) is positive definite, hypothesisTest avoids the computation of the generalized eigenvalue problem from scratch. In this case, the eigenvalues \(\theta_i\) are obtained from \(\lambda_i\) by

\[\theta_i = \frac{\lambda_i}{1 + \lambda_i}\]

The fourth test statistic is as follows:

Pillai’s trace

\[V = tr \left[S_H\left(S_H + S_E\right)^{-1}\right] = \sum_{i=1}^{p} \theta_i\]

V is output as value. The p-value is based on an approximation discussed by Pillai (1985). The statistic

\[F = \frac{2n+s+1}{2m+s+1} \frac{V}{s-V}\]

has an approximate F distribution with \(s(2m+s+1)\) and \(s(2n+s+1)\) numerator and denominator degrees of freedom, respectively, where

\[s = \min (p, q)\]
\[m = \tfrac{1}{2}(|p − q| −1)\]
\[n = \tfrac{1}{2}(ν − p − 1)\]

The F test is exact if \(\min(p,q)=1\).

Examples

Example 1

The data for this example are from Maindonald (1984, p. 203−204). A multivariate regression model containing two dependent variables and three independent variables is fit using function regression and the results stored in the structure regressionInfo. The sum of squares and crossproducts matrix, scph, is then computed with a call to hypothesisScph for the test that the third independent variable is in the model (determined by specification of h). Finally, function hypothesisTest is called to compute the p-value for the test statistic (Wilk’s lambda).

from __future__ import print_function
from numpy import *
from pyimsl.stat.hypothesisScph import hypothesisScph
from pyimsl.stat.regression import regression
from pyimsl.stat.hypothesisTest import hypothesisTest


x = array([
    [7.0, 5.0, 6.0],
    [2.0, -1.0, 6.0],
    [7.0, 3.0, 5.0],
    [-3.0, 1.0, 4.0],
    [2.0, -1.0, 0.0],
    [2.0, 1.0, 7.0],
    [-3.0, -1.0, 3.0],
    [2.0, 1.0, 1.0],
    [2.0, 1.0, 4.0]])
y = array([
    [7.0, 1.0],
    [-5.0, 4.0],
    [6.0, 10.0],
    [5.0, 5.0],
    [5.0, -2.0],
    [-2.0, 4.0],
    [0.0, -6.0],
    [8.0, 2.0],
    [3.0, 0.0]])
n_observations = 9
n_independent = 3
n_dependent = 2
nh = 1
h = array([[0, 0, 0, 1]])
info = []
dfh = []

coefficients = regression(x, y,
                          nDependent=n_dependent,
                          regressionInfo=info)

scph = hypothesisScph(info[0], h, dfh)

p_value = hypothesisTest(info[0], dfh[0], scph)

print("P-value = %10.6f" % p_value)

Output

P-value =   0.000010

Example 2

This example is the same as the first example, but more statistics are computed. Also, the U matrix, u, is explicitly specified as the identity matrix (which is the same default configuration of U).

from __future__ import print_function
from numpy import *
from pyimsl.stat.hypothesisScph import hypothesisScph
from pyimsl.stat.hypothesisTest import hypothesisTest
from pyimsl.stat.regression import regression


x = array([
    [7.0, 5.0, 6.0],
    [2.0, -1.0, 6.0],
    [7.0, 3.0, 5.0],
    [-3.0, 1.0, 4.0],
    [2.0, -1.0, 0.0],
    [2.0, 1.0, 7.0],
    [-3.0, -1.0, 3.0],
    [2.0, 1.0, 1.0],
    [2.0, 1.0, 4.0]])
y = array([
    [7.0, 1.0],
    [-5.0, 4.0],
    [6.0, 10.0],
    [5.0, 5.0],
    [5.0, -2.0],
    [-2.0, 4.0],
    [0.0, -6.0],
    [8.0, 2.0],
    [3.0, 0.0]])
n_observations = 9
n_independent = 3
n_dependent = 2
nh = 1
h = array([[0, 0, 0, 1]])
nu = 2
u = [[1, 0], [0, 1]]
info = []
dfh = []

coefficients = regression(x, y,
                          nDependent=n_dependent,
                          regressionInfo=info)

scph = hypothesisScph(info[0], h, dfh)

wilk_lambda = {}
roy_max_root = {}
hotelling_trace = {}
pillai_trace = {}
p_value = hypothesisTest(info[0], dfh[0], scph,
                         u=u, wilkLambda=wilk_lambda,
                         royMaxRoot=roy_max_root,
                         hotellingTrace=hotelling_trace,
                         pillaiTrace=pillai_trace)

v1 = wilk_lambda["value"]
p1 = wilk_lambda["pValue"]
v2 = roy_max_root["value"]
p2 = roy_max_root["pValue"]
v3 = hotelling_trace["value"]
p3 = hotelling_trace["pValue"]
v4 = pillai_trace["value"]
p4 = pillai_trace["pValue"]
print("Wilk      value = %10.6f   p-value = %10.6f" % (v1, p1))
print("Roy       value = %10.6f   p-value = %10.6f" % (v2, p2))
print("Hotelling value = %10.6f   p-value = %10.6f" % (v3, p3))
print("Pillai    value = %10.6f   p-value = %10.6f" % (v4, p4))

Output

Wilk      value =   0.003149   p-value =   0.000010
Roy       value = 316.600000   p-value =   0.000010
Hotelling value = 316.600000   p-value =   0.000010
Pillai    value =   0.996851   p-value =   0.000010

Warning Errors

IMSLS_SINGULAR_1 “u”“scpe”“u” is singular. Only Pillai’s trace can be computed. Other statistics are set to NaN.

Fatal Errors

IMSLS_NO_STAT_1 “scpe” + “scph” is singular. No tests can be computed.
IMSLS_NO_STAT_2 No statistics can be computed. Iterations for eigenvalues for the generalized eigenvalue problem “scph”x = (lambda)(“scph”+“scpe”)x failed to converge.
IMSLS_NO_STAT_3 No statistics can be computed. Iterations for eigenvalues for the generalized eigenvalue problem “scph” x = (lambda)(“scph”+“u”“scpe”“u”)x failed to converge.
IMSLS_SINGULAR_2 “u”“scpe”“u” + “scph” is singular. No tests can be computed.
IMSLS_SINGULAR_TRI_MATRIX The input triangular matrix is singular. The index of the first zero diagonal element is equal to #.