anovaBalanced¶
Analyzes a balanced complete experimental design for a fixed, random, or mixed model.
Synopsis¶
anovaBalanced (nLevels, y, nRandom, indexRandomFactor, nFactorsPerEffect, indexFactorPerEffect)
Required Arguments¶
- int
nLevels[]
(Input) - Array of length
nFactors
containing the number of levels for each of the factors. - float
y[]
(Input) - Array of length
nLevels[0]
×nLevels[1]
× … ×nLevels[nFactors
‑1]
containing the responses.y[]
must not contain NaN (not a number) for any of its elements, i.e., missing values are not allowed. - int
nRandom
(Input) - For positive
nRandom
,|nRandom|
is the number of random factors. For negativenRandom
,|nRandom|
is the number of random effects (sources of variation). - int
indexRandomFactor[]
(Input) - Index array of length
|nRandom|
containing either the factor numbers to be considered random (fornRandom
positive) or containing the effect numbers to be considered random (fornRandom
negative). IfnRandom
= 0,indexRandomFactor
is not referenced. - int
nFactorsPerEffect[]
(Input) - Array of length
nModelEffects
containing the number of factors associated with each effect in the model. - int
indexFactorPerEffect[]
(Input) - Index vector of length
nFactorsPerEffect[0]
+nFactorsPerEffect[1]
+ … +nFactorsPerEffect[nModelEffects-1]
. The firstnFactorsPerEffect[0]
elements give the factor numbers in the first effect. Thenext nFactorsPerEffect[1]
elements give the factor numbers in the second effect. The lastnFactorsPerEffect
[nModelEffects-1]
elements give the factor numbers in the last effect. Main effects must appear before their interactions. In general, an effect E cannot appear after an effect F if all of the indices for E appear also in F.
Return Value¶
The p-value for the F-statistic.
Optional Arguments¶
anovaTable
(Output)An array of size 15 containing the analysis of variance table. The analysis of variance statistics are as follows:
Element Analysis of Variance Statistics 0 Degrees of freedom for the model. 1 Degrees of freedom for error. 2 Total (corrected) degrees of freedom. 3 Sum of squares for the model. 4 Sum of squares for error. 5 Total (corrected) sum of squares. 6 Model mean square. 7 Error mean square. 8 Overall F-statistic. 9 p-value. 10 \(R^2\) (in percent) 11 Adjusted \(R^2\) (in percent). 12 Estimate of the standard deviation. 13 Overall mean of Y
.14 Coefficient of variation (in percent). Note that the p‑value is returned as 0.0 when the value is so small that all significant digits have been lost.
model
int (Input)Model Option
model
Meaning 0 Searle model. 1 Scheffe model . For the Scheffe model, effects corresponding to interactions of fixed and random factors have their sum over the subscripts corresponding to fixed factors equal to zero. Also, the variance of a random interaction effect involving some fixed factors has a multiplier for the associated variance component that involves the number of levels in the fixed factors. The Searle model has no summation restrictions on the random interaction effects and has a multiplier of one for each variance component. The default is model = 0.
confidence
, float (Input)Confidence level for two-sided interval estimates on the variance components, in percent.
confidence
percent confidence intervals are computed, hence,confidence
must be in the interval \(\left[0.0,100.0\right)\).confidence
often will be90.0
,95.0
, or99.0
.For one-sided intervals with confidence level α, α in the interval \(\left[50.0, 100.0\right)\),
set confidence
=
\(100.0-2.0\times 100.0-\alpha)\).Default:
confidence =
95.0.varianceComponents
(Output)An array,
varianceComponents
.varianceComponents
is an(nModelEffects
+ 1) by 9 array containing statistics relating to the particular variance components or effects in the model and the error. Rows ofvarianceComponents
correspond to thenModelEffects
effects plus error.Column Description 0 Degrees of freedom. 1 Sum of squares. 2 Mean squares. 3 F –statistic. 4 p-value for F test. 5 Variance component estimate. 6 Percent of variance of variance explained by variance component. 7 Lower endpoint for a confidence interval on the variance component. 8 Upper endpoint for a confidence interval on the variance component. Elements 5 through 8 contain NaN (not a number) if the effect is fixed, i.e., if there is no variance component to be estimated. If the variance component estimate is negative, columns 7 and 8 contain NaN. Note that the p-value for the F test is returned as 0.0 when the value is so small that all significant digits have been lost.
ems
(Output)- An array of length
(nModelEffects + 1)
×(nModelEffects + 2)/2
containing expected mean square coefficients. Suppose the effects are A, B, and AB. The ordering of the coefficients inems
is as follows:
Error | AB | B | A | |
---|---|---|---|---|
A | ems [0] |
ems [1] |
ems [2] |
ems [3] |
B | ems [4] |
ems [5] |
ems [6] |
|
AB | ems [7] |
ems [8] |
||
Error | ems [9] |
yMeans
(Output)- An array of length (
nLevels
[0] + 1) × (nLevels
[1] + 1) ×...
× (nLevels
[nFactors
-1] + 1) containing the subgroup means. Suppose the factors are A, B, and C. The ordering of the means is grand mean, A means, B means, C means, AB means, AC means, BC means, and ABC means.
Description¶
Function anovaBalanced
analyzes a balanced complete experimental design
for a fixed, random, or mixed model. The analysis includes an analysis of
variance table, and computation of subgroup means and variance component
estimates. A choice of two parameterizations of the variance components for
the model can be made.
Scheffé (1959, pages 274-289) discusses the parameterization for model
=
1. For example, consider the following model equation with fixed factor A
and random factor B:
The fixed effects \(\alpha_i\)’s are subject to the restriction
the \(b_j\)’s are random effects identically and independently distributed
\(c_{ij}\) are interaction effects each distributed
and are subject to the restrictions
and the \(e_{ijk}\)’s are errors identically and independently distributed \(N(0,\sigma^2)\). In general, interactions of fixed and random factors have sums over subscripts corresponding to fixed factors equal to zero. Also in general, the variance of a random interaction effect is the associated variance component times a product of ratios for each fixed factor in the random interaction term. Each ratio depends on the number of levels in the fixed factor. In the earlier example, the random interaction AB has the ratio \((a-1)/a\) as a multiplier of
and
In a three-way crossed classification model, an ABC interaction effect with A fixed, B random, and C fixed would have variance
Searle (1971, pages 400−401) discusses the parameterization for model
=
0. This parameterization does not have the summation restrictions on the
effects corresponding to interactions of fixed and random factors. Also, the
variance of each random interaction term is the associated variance
component, i.e., without the multiplier. This parameterization is also used
with unbalanced data, which is one reason for its popularity with balanced
data also. In the earlier example,
Searle (1971, pages 400−404) compares these two parameterizations. Hocking (1973) considers these different parameterizations and concludes they are equivalent because they yield the same variance-covariance structure for the responses. Differences in covariances for individual terms, differences in expected mean square coefficients and differences in F tests are just a consequence of the definition of the individual terms in the model and are not caused by any fundamental differences in the models. For the earlier two-way model, Hocking states that the relations between the two parameterizations of the variance components are
where
are the variance components in the parameterization with model
= 0.
The computations for degrees of freedom and sums of squares are the same
regardless of the option specified by model
. anovaBalanced
first
computes degrees of freedom and sum of squares for a full factorial design.
Degrees of freedom for effects in the factorial design that are missing from
the specified model are pooled into the model effect containing the fewest
subscripts but still containing the factorial effect. If no such model
effect exists, the factorial effect is pooled into error. If more than one
such effect exists, a terminal error message is issued indicating a
misspecified model.
The analysis of variance method is used for estimating the variance components. This method solves a linear system in which the mean squares are set to the expected mean squares. A problem that Hocking (1985, pages 324−330) discusses is that this method can yield a negative variance component estimate. Hocking suggests a diagnostic procedure for locating the cause of the negative estimate. It may be necessary to re-examine the assumptions of the model.
The percentage of variation explained by each random effect is computed
(output in varianceComponents
column 6) as the variance of the
associated random effect divided by the variance of y. The two
parameterizations can lead to different values because of the different
definitions of the individual terms in the model. For example, the
percentage associated with the AB interaction term in the earlier two-way
mixed model is computed for model
= 1 using the formula
while for the parameterization model
= 0, the percentage is computed
using the formula
In each case, the variance components are replaced by their estimates
(stored in varianceComponents
column 5).
Confidence intervals on the variance components are computed using the method discussed by Graybill (1976, Theorem 15.3.5, page 624, and Note 4, page 620).
Example¶
An analysis of a generalized randomized block design is performed using data discussed by Kirk (1982, Table 6.10-1, pages 293−297). The model is
where \(y_{ijk}\) is the response for the k-th experimental unit in block j with treatment i; the \(\alpha_i\)’s are the treatment effects and are subject to the restriction
the \(b_j\)’s are block effects identically and independently distributed
\(c_{ij}\) are interaction effects each distributed
and are subject to the restrictions
and the \(e_{ijk}\)’s are errors, identically and independently distributed \(N(0,\sigma^2)\). The interaction effects are assumed to be distributed independently of the errors.
The data are given in the following table:
Block | ||||
Treatment | 1 | 2 | 3 | 4 |
1 | 3, 6 | 3, 1 | 2, 2 | 3, 2 |
2 | 4, 5 | 4, 2 | 3, 4 | 3, 3 |
3 | 7, 8 | 7, 5 | 6, 5 | 6, 6 |
4 | 7, 8 | 9, 10 | 10, 9 | 8, 11 |
from numpy import *
from pyimsl.stat.anovaBalanced import anovaBalanced
from pyimsl.stat.writeMatrix import writeMatrix
pvalue = -99.
n_levels = [4, 4, 2]
indrf = [2, 3]
nfef = [1, 1, 2]
indef = [1, 2, 1, 2]
y = [3., 6., 3., 1., 2., 2., 3., 2., 4., 5., 4.,
2., 3., 4., 3., 3., 7., 8., 7., 5., 6., 5.,
6., 6., 7., 8., 9., 10., 10., 9., 8., 11.]
aov_labels = ["degrees of freedom for model",
"degrees of freedom for error",
"total (corrected) degrees of freedom",
"sum of squares for model",
"sum of squares for error",
"total (corrected) sum of squares",
"model mean square",
"error mean square",
"F-statistic",
"p-value",
"R-squared (in percent)",
"adjusted R-squared (in percent)",
"est. standard deviation of within error",
"overall mean of y",
"coefficient of variation (in percent)"]
ems_labels = ["Effect A and Error",
"Effect A and Effect B",
"Effect A and Effect AB",
"Effect A and Effect B",
"Effect A and Effect A",
"Effect B and Error",
"Effect B and Effect AB",
"Effect B and Effect B",
"Effect AB and Error",
"Effect AB and Effect AB",
"Error and Error"]
means_labels = ["Grand mean",
" A means 1",
" A means 2",
" A means 3",
" A means 4",
" B means 1",
" B means 2",
" B means 3",
" B means 4",
"AB means 1 1",
"AB means 1 2",
"AB means 1 3",
"AB means 1 4",
"AB means 2 1",
"AB means 2 2",
"AB means 2 3",
"AB means 2 4",
"AB means 3 1",
"AB means 3 2",
"AB means 3 3",
"AB means 3 4",
"AB means 4 1",
"AB means 4 2",
"AB means 4 3",
"AB means 4 4"]
components_col_labels = ["", "Degrees of freedom",
"Sum of squares",
"Mean squares",
"F-statistic",
"p-value for F test",
"Variance component estimate",
"%var explained by var. comp.",
"confidence min for var. comp.",
"confidence max for var. comp."]
components_row_labels = ["A", "B", "AB", "Error"]
ems = []
variance_components = []
aov = []
y_means = []
pvalue = anovaBalanced(n_levels, y, 2, indrf,
nfef, indef, model=1, ems=ems,
varianceComponents=variance_components,
yMeans=y_means, anovaTable=aov)
writeMatrix("* * * Analysis of Variance * * *", aov,
rowLabels=aov_labels, writeFormat="%10.5f", column=True)
writeMatrix("* * * Expected Mean Square Coefficients * * *",
ems, rowLabels=ems_labels, writeFormat="%10.5f", column=True)
writeMatrix("* * Analysis of Variance / Variance Components * *",
variance_components,
rowLabels=components_row_labels, colLabels=components_col_labels,
writeFormat="%10.5f")
writeMatrix("means", y_means, rowLabels=means_labels,
writeFormat="%6.2f", column=True)
Output¶
* * * Analysis of Variance * * *
degrees of freedom for model 15.00000
degrees of freedom for error 16.00000
total (corrected) degrees of freedom 31.00000
sum of squares for model 216.50000
sum of squares for error 19.00000
total (corrected) sum of squares 235.50000
model mean square 14.43333
error mean square 1.18750
F-statistic 12.15439
p-value 0.00000
R-squared (in percent) 91.93206
adjusted R-squared (in percent) 84.36837
est. standard deviation of within error 1.08972
overall mean of y 5.37500
coefficient of variation (in percent) 20.27395
* * * Expected Mean Square Coefficients * * *
Effect A and Error 1.00000
Effect A and Effect B 2.00000
Effect A and Effect AB 0.00000
Effect A and Effect B 8.00000
Effect A and Effect A 1.00000
Effect B and Error 0.00000
Effect B and Effect AB 8.00000
Effect B and Effect B 1.00000
Effect AB and Error 2.00000
Effect AB and Effect AB 1.00000
* * Analysis of Variance / Variance Components * *
Degrees of Sum of squares Mean squares F-statistic p-value for
freedom F test
A 3.00000 194.50000 64.83333 32.87324 0.00004
B 3.00000 4.25000 1.41667 1.19298 0.34396
AB 9.00000 17.75000 1.97222 1.66082 0.18016
Error 16.00000 19.00000 1.18750 .......... ..........
Variance compon %var explained confidence confidence
ent estimate by var. comp. min for var. max for var.
comp. comp.
A .......... .......... .......... ..........
B 0.02865 1.89655 0.00000 2.31682
AB 0.39236 19.48276 0.00000 2.75803
Error 1.18750 78.62069 0.65869 2.75057
means
Grand mean 5.38
A means 1 2.75
A means 2 3.50
A means 3 6.25
A means 4 9.00
B means 1 6.00
B means 2 5.12
B means 3 5.12
B means 4 5.25
AB means 1 1 4.50
AB means 1 2 2.00
AB means 1 3 2.00
AB means 1 4 2.50
AB means 2 1 4.50
AB means 2 2 3.00
AB means 2 3 3.50
AB means 2 4 3.00
AB means 3 1 7.50
AB means 3 2 6.00
AB means 3 3 5.50
AB means 3 4 6.00
AB means 4 1 7.50
AB means 4 2 9.50
AB means 4 3 9.50
AB means 4 4 9.50