hypothesisTest¶
Performs tests for a multivariate general linear hypothesis \(H\beta U=G\) given the hypothesis sums of squares and crossproducts matrix \(S_H\).
Synopsis¶
hypothesisTest (regressionInfo, dfh, scph)
Required Argument¶
- structure
regressionInfo(Input) - A structure containing information about the regression fit. See function regression.
- float
dfh(Input) - Degrees of freedom for the sums of squares and crossproducts matrix.
- float
scph[[]](Input) - Array of size
nubynucontaining \(S_H\), the sums of squares and crossproducts attributable to the hypothesis.
Return Value¶
The p-value corresponding to Wilks’ lambda test.
Optional Arguments¶
u, float[](Input)Argument
ucontains the nDependent bynuU matrix for the test \(H_p\beta U=G_p\); wherenuis the number of linear combinations of the dependent variables to be considered. The valuenumust be greater than 0 and less than or equal to nDependent.Default:
nu= nDependent anduis the identity matrixwilkLambda,value,pValue(Output)- Wilk’s lamda and p-value.
royMaxRoot,value,pValue(Output)- Roy’s maximum root criterion and p-value.
hotellingTrace,value,pValue(Output)- Hotelling’s trace and p-value.
pillaiTrace,value,pValue(Output)- Pillai’s trace and p-value.
Description¶
Function hypothesisTest computes test statistics and p-values for the
general linear hypothesis \(H\beta U=G\) for the multivariate general
linear model.
The hypothesis sum of squares and crossproducts matrix input in scph is
where C is a solution to \(R^TC=H,(C^TDC)^-\) denotes the generalized inverse of \(C^TDC\), and D is a diagonal matrix with diagonal elements
For a detailed discussion, see Linear Dependence and the R Matrix in the Usage Notes.
The error sum of squares and crossproducts matrix for the model \(Y=X\beta+\varepsilon\) is
which is input in regressionInfo. The error sum of squares and
crossproducts matrix for the hypothesis \(H\beta U=G\) computed by
hypothesisTest is
Let p equal the order of the matrices \(S_E\) and \(S_H\), i.e.,
Let q (stored in dfh) be the degrees of freedom for the hypothesis. Let
v (input in regressionInfo) be the degrees of freedom for error.
Function hypothesisTest computed three test statistics based on
eigenvalues \(\lambda_i\) (\(i=1,2,\ldots,p\)) of the generalized
eigenvalue problem \(S_Hx=\lambda S_Ex\). These test statistics are as
follows:
Wilk’s lambda
The associated p-value is based on an approximation discussed by Rao (1973, p. 556). The statistic
has an approximate F distribution with pq and \(ms-pq/2+1\) numerator and denominator degrees of freedom, respectively, where
and
The F test is exact if \(\min(p,q)\leq 2\) (Kshirsagar, 1972, Theorem 4, p. 299−300).
Roy’s maximum root
where c is output as value. The p-value is based on the
approximation
where \(s=\max(p,q)\) has an approximate F distribution with s and
\(ν + q − s\) numerator and denominator degrees of freedom, respectively.
The F test is exact if \(s=1\); the p-value is also exact. In
general, the value output in pValue is lower bound on the actual
p-value.
Hotelling’s trace
U is output as value. The p-value is based on the approximation of
McKeon (1974) that supersedes the approximation of Hughes and Saw (1972).
McKeon’s approximation is also discussed by Seber (1984, p. 39). For
the p-value is based on the result that
has an approximate F distribution with pq and b degrees of freedom. The
test is exact if \(\min(p,q)=1\). For \(v\leq p+1\), the
approximation is not valid, and pValue is set to NaN.
These three test statistics are valid when \(S_E\) is positive definite.
A necessary condition for \(S_E\) to be positive definite is
\(\nu\geq p\). If \(S_E\) is not positive definite, a warning error
message is issued, and both value and pValue are set to NaN.
Because the requirement ν ≥ p can be a serious drawback, hypothesisTest
computes a fourth test statistic based on eigenvalues \(\theta_i\)
(\(i= 1,2,\ldots,p\)) of the generalized eigenvalue problem
\(S_Hw=\theta(S_H+ S_E)w\). This test statistic requires a less
restrictive assumption— \(S_H+S_E\) is positive definite. A necessary
condition for \(S_H +S_E\) to be positive definite is \(\nu+q\geq
p\). If \(S_E\) is positive definite, hypothesisTest avoids the
computation of the generalized eigenvalue problem from scratch. In this case,
the eigenvalues \(\theta_i\) are obtained from \(\lambda_i\) by
The fourth test statistic is as follows:
Pillai’s trace
V is output as value. The p-value is based on an approximation
discussed by Pillai (1985). The statistic
has an approximate F distribution with \(s(2m+s+1)\) and \(s(2n+s+1)\) numerator and denominator degrees of freedom, respectively, where
The F test is exact if \(\min(p,q)=1\).
Examples¶
Example 1¶
The data for this example are from Maindonald (1984, p. 203−204). A
multivariate regression model containing two dependent variables and three
independent variables is fit using function
regression and the results stored in the structure
regressionInfo. The sum of squares and crossproducts matrix, scph,
is then computed with a call to hypothesisScph for
the test that the third independent variable is in the model (determined by
specification of h). Finally, function hypothesisTest is called to
compute the p-value for the test statistic (Wilk’s lambda).
from __future__ import print_function
from numpy import *
from pyimsl.stat.hypothesisScph import hypothesisScph
from pyimsl.stat.regression import regression
from pyimsl.stat.hypothesisTest import hypothesisTest
x = array([
[7.0, 5.0, 6.0],
[2.0, -1.0, 6.0],
[7.0, 3.0, 5.0],
[-3.0, 1.0, 4.0],
[2.0, -1.0, 0.0],
[2.0, 1.0, 7.0],
[-3.0, -1.0, 3.0],
[2.0, 1.0, 1.0],
[2.0, 1.0, 4.0]])
y = array([
[7.0, 1.0],
[-5.0, 4.0],
[6.0, 10.0],
[5.0, 5.0],
[5.0, -2.0],
[-2.0, 4.0],
[0.0, -6.0],
[8.0, 2.0],
[3.0, 0.0]])
n_observations = 9
n_independent = 3
n_dependent = 2
nh = 1
h = array([[0, 0, 0, 1]])
info = []
dfh = []
coefficients = regression(x, y,
nDependent=n_dependent,
regressionInfo=info)
scph = hypothesisScph(info[0], h, dfh)
p_value = hypothesisTest(info[0], dfh[0], scph)
print("P-value = %10.6f" % p_value)
Output¶
P-value = 0.000010
Example 2¶
This example is the same as the first example, but more statistics are
computed. Also, the U matrix, u, is explicitly specified as the
identity matrix (which is the same default configuration of U).
from __future__ import print_function
from numpy import *
from pyimsl.stat.hypothesisScph import hypothesisScph
from pyimsl.stat.hypothesisTest import hypothesisTest
from pyimsl.stat.regression import regression
x = array([
[7.0, 5.0, 6.0],
[2.0, -1.0, 6.0],
[7.0, 3.0, 5.0],
[-3.0, 1.0, 4.0],
[2.0, -1.0, 0.0],
[2.0, 1.0, 7.0],
[-3.0, -1.0, 3.0],
[2.0, 1.0, 1.0],
[2.0, 1.0, 4.0]])
y = array([
[7.0, 1.0],
[-5.0, 4.0],
[6.0, 10.0],
[5.0, 5.0],
[5.0, -2.0],
[-2.0, 4.0],
[0.0, -6.0],
[8.0, 2.0],
[3.0, 0.0]])
n_observations = 9
n_independent = 3
n_dependent = 2
nh = 1
h = array([[0, 0, 0, 1]])
nu = 2
u = [[1, 0], [0, 1]]
info = []
dfh = []
coefficients = regression(x, y,
nDependent=n_dependent,
regressionInfo=info)
scph = hypothesisScph(info[0], h, dfh)
wilk_lambda = {}
roy_max_root = {}
hotelling_trace = {}
pillai_trace = {}
p_value = hypothesisTest(info[0], dfh[0], scph,
u=u, wilkLambda=wilk_lambda,
royMaxRoot=roy_max_root,
hotellingTrace=hotelling_trace,
pillaiTrace=pillai_trace)
v1 = wilk_lambda["value"]
p1 = wilk_lambda["pValue"]
v2 = roy_max_root["value"]
p2 = roy_max_root["pValue"]
v3 = hotelling_trace["value"]
p3 = hotelling_trace["pValue"]
v4 = pillai_trace["value"]
p4 = pillai_trace["pValue"]
print("Wilk value = %10.6f p-value = %10.6f" % (v1, p1))
print("Roy value = %10.6f p-value = %10.6f" % (v2, p2))
print("Hotelling value = %10.6f p-value = %10.6f" % (v3, p3))
print("Pillai value = %10.6f p-value = %10.6f" % (v4, p4))
Output¶
Wilk value = 0.003149 p-value = 0.000010
Roy value = 316.600000 p-value = 0.000010
Hotelling value = 316.600000 p-value = 0.000010
Pillai value = 0.996851 p-value = 0.000010
Warning Errors¶
IMSLS_SINGULAR_1 |
“u”“scpe”“u” is singular. Only Pillai’s trace can be computed. Other statistics are set to NaN. |
Fatal Errors¶
IMSLS_NO_STAT_1 |
“scpe” + “scph” is singular. No tests can be computed. |
IMSLS_NO_STAT_2 |
No statistics can be computed. Iterations for eigenvalues for the generalized eigenvalue problem “scph”x = (lambda)(“scph”+“scpe”)x failed to converge. |
IMSLS_NO_STAT_3 |
No statistics can be computed. Iterations for eigenvalues for the generalized eigenvalue problem “scph” x = (lambda)(“scph”+“u”“scpe”“u”)x failed to converge. |
IMSLS_SINGULAR_2 |
“u”“scpe”“u” + “scph” is singular. No tests can be computed. |
IMSLS_SINGULAR_TRI_MATRIX |
The input triangular matrix is singular. The index of the first zero diagonal element is equal to #. |