anovaFactorial¶
Analyzes a balanced factorial design with fixed effects.
Synopsis¶
anovaFactorial (nLevels, y)
Required Arguments¶
- int
nLevels
(Input) - Array of length
nSubscripts
containing the number of levels for each of the factors for the firstnSubscripts
− 1 elements.nLevels
[nSubscripts
− 1] is the number of observations per cell. - float
y[]
(Input) - Array of length
nLevels
[0]*nLevels
[1] × … *nLevels
[nSubscripts
− 1] containing the responses. Argumenty
must not contain NaN for any of its elements; i.e., missing values are not allowed.
Return Value¶
The p-value for the overall F test.
Optional Arguments¶
modelOrder
, int (Input)- Number of factors to be included in the highest-way interaction in the
model. Argument
modelOrder
must be in the interval [1,nSubscripts
− 1]. For example, amodelOrder
of 1 indicates that a main effect model will be analyzed, and amodelOrder
of 2 indicates that two-way interactions will be included in the model. Default:modelOrder
=nSubscripts
− 1.
pureError
(Input)
or
poolInteractions
(Input)pureError
, the default option, indicates factornSubscripts
is error. Its main effect and all its interaction effects are pooled into the error with the other (modelOrder
+ 1)-way and higher-way interactions.poolInteractions
indicates factornSubscripts
is not error. Only (modelOrder
+ 1)-way and higher-way interactions are included in the error.anovaTable
(Output)An array of size 15 containing the analysis of variance table. The analysis of variance statistics are given as follows:
Element Analysis of Variance Statistics 0 Degrees of freedom for the model. 1 Degrees of freedom for error. 2 Total (corrected) degrees of freedom. 3 Sum of squares for the model. 4 Sum of squares for error. 5 Total (corrected) sum of squares. 6 Model mean square. 7 Error mean square. 8 Overall F-statistic. 9 p-value. 10 \(R^2\) (in percent). 11 Adjusted \(R^2\) (in percent). 12 Estimate of the standard deviation. 13 Overall mean of y. 14 Coefficient of variation (in percent). Note that the p‑value is returned as 0.0 when the value is so small that all significant digits have been lost.
testEffects
(Output)An
NEF
× 4 array containing a matrix containing statistics relating to the sums of squares for the effects in the model. Here,\[\mathrm{NEF} = \binom{n}{1} + \binom{n}{2} + \ldots + \binom{n}{\min(n,|\mathrm{modelOrder}|)}\]where n is given by
nSubscripts
ifpoolInteractions
is specified; otherwise,nSubscripts
− 1.Suppose the factors are A, B, C, and error. With
modelOrder
= 3, rows 0 throughNEF
− 1 would correspond to A, B, C, AB, AC, BC, and ABC, respectively. The columns oftestEffects
are as follows:Column Description 0 Degrees of freedom. 1 Sum of squares. 2 F-statistic. 3 p-value. Note that the p‑value is returned as 0.0 when the value is so small that all significant digits have been lost.
means
(Output)An array of length (
nLevels
[0] + 1) × (nLevels
[1] + 1) × … × (nLevels
[n − 1] + 1) containing the subgroup means.See argument
testEffects
for a definition of n. If the factors are A, B, C, and error, the ordering of the means is grand mean, A means, B means, C means, AB means, AC means, BC means, and ABC means.
Description¶
Function anovaFactorial
performs an analysis for an n-way
classification design with balanced data. For balanced data, there must be
an equal number of responses in each cell of the n-way layout. The effects
are assumed to be fixed effects. The model is an extension of the two-way
model to include n factors. The interactions (two-way, three-way, up to
n-way) can be included in the model, or some of the higher-way
interactions can be pooled into error. The argument modelOrder
specifies
the number of factors to be included in the highest-way interaction. For
example, if three-way and higher-way interactions are to be pooled into
error, set modelOrder
= 2. (By default, modelOrder
= nSubscripts
− 1 with the last subscript being the error subscript.) Argument
pureError
indicates there are repeated responses within the n-way
cell; poolInteractions
indicates otherwise.
Function anovaFactorial
requires the responses as input into a single
vector y in lexicographical order, so that the response subscript
associated with the first factor varies least rapidly, followed by the
subscript associated with the second factor, and so forth. Hemmerle (1967,
Chapter 5) discusses the computational method.
Examples¶
Example 1¶
A two-way analysis of variance is performed with balanced data discussed by Snedecor and Cochran (1967, Table 12.5.1, p. 347). The responses are the weight gains (in grams) of rats that were fed diets varying in the source (A) and level (B) of protein. The model is
where
for i = 1, 2. The first responses in each cell in the two-way layout are given in the following table:
Protein Source (A) | |||
Protein Level (B) | Beef | Cereal | Pork |
High | 73, 102, 118, 104, 81, 107, 100, 87, 117, 111 | 98, 74, 56, 111, 95, 88, 82, 77, 86, 92 | 94, 79, 96, 98, 102, 102, 108, 91, 120, 105 |
Low | 90, 76, 90, 64, 86, 51, 72, 90, 95, 78 | 107, 95, 97, 80, 98, 74, 74, 67, 89, 58 | 49, 82, 73, 86, 81, 97, 106, 70, 61, 82 |
from __future__ import print_function
from numpy import *
from pyimsl.stat.anovaFactorial import anovaFactorial
n_levels = [3, 2, 10]
y = [73., 102., 118., 104., 81., 107., 100., 87., 117., 111.,
90., 76., 90., 64., 86., 51., 72., 90., 95., 78.,
98., 74., 56., 111., 95., 88., 82., 77., 86., 92.,
107., 95., 97., 80., 98., 74., 74., 67., 89., 58.,
94., 79., 96., 98., 102., 102., 108., 91., 120., 105.,
49., 82., 73., 86., 81., 97., 106., 70., 61., 82.]
p_value = anovaFactorial(n_levels, y)
print("P-value = %10.6f" % p_value)
Output¶
P-value = 0.002299
Example 2¶
In this example, the same model and data is fit as in the initial example, but optional arguments are used for a more complete analysis.
from __future__ import print_function
from numpy import *
from pyimsl.stat.anovaFactorial import anovaFactorial
from pyimsl.stat.writeMatrix import writeMatrix
n_subscripts = 3
n_levels = [3, 2, 10]
y = [73., 102., 118., 104., 81., 107., 100., 87., 117., 111.,
90., 76., 90., 64., 86., 51., 72., 90., 95., 78.,
98., 74., 56., 111., 95., 88., 82., 77., 86., 92.,
107., 95., 97., 80., 98., 74., 74., 67., 89., 58.,
94., 79., 96., 98., 102., 102., 108., 91., 120., 105.,
49., 82., 73., 86., 81., 97., 106., 70., 61., 82.]
labels = ["degrees of freedom for the model",
"degrees of freedom for error",
"total (corrected) degrees of freedom",
"sum of squares for the model",
"sum of squares for error",
"total (corrected) sum of squares",
"model mean square", "error mean square",
"F-statistic", "p-value",
"R-squared (in percent)", "Adjusted R-squared (in percent)",
"est. standard deviation of the model error",
"overall mean of y",
"coefficient of variation (in percent)"]
test_row_labels = ["A", "B", "A*B"]
test_col_labels = ["Source", "DF", "Sum of\nSquares",
"Mean\nSquare", "Prob. of\nLarger F"]
mean_row_labels = ["grand mean",
"A1", "A2", "A3",
"B1", "B2",
"A1*B1", "A1*B2", "A2*B1",
"A2*B2", "A3*B1", "A3*B2"]
# Perform analysis
anova_table = []
test_effects = []
means = []
p_value = anovaFactorial(n_levels, y,
anovaTable=anova_table,
testEffects=test_effects,
means=means)
# Print results
print("P-value = %10.6f" % p_value)
writeMatrix("* * * Analysis of Variance * * *",
anova_table, rowLabels=labels, writeFormat="%11.4f", column=True)
writeMatrix("* * * Variation Due to the Model * * *",
test_effects, rowLabels=test_row_labels,
colLabels=test_col_labels, writeFormat="%11.4f")
writeMatrix("* * * Subgroup Means * * *", means,
rowLabels=mean_row_labels,
writeFormat="%11.4f", column=True)
tmpLen = 1
for i in range(0, len(n_levels) - 1):
tmpLen = tmpLen * (n_levels[i] + 1)
Output¶
P-value = 0.002299
* * * Analysis of Variance * * *
degrees of freedom for the model 5.0000
degrees of freedom for error 54.0000
total (corrected) degrees of freedom 59.0000
sum of squares for the model 4612.9333
sum of squares for error 11586.0000
total (corrected) sum of squares 16198.9333
model mean square 922.5867
error mean square 214.5556
F-statistic 4.3000
p-value 0.0023
R-squared (in percent) 28.4768
Adjusted R-squared (in percent) 21.8543
est. standard deviation of the model error 14.6477
overall mean of y 87.8667
coefficient of variation (in percent) 16.6704
* * * Variation Due to the Model * * *
Source DF Sum of Mean Prob. of
Squares Square Larger F
A 2.0000 266.5333 0.6211 0.5411
B 1.0000 3168.2667 14.7666 0.0003
A*B 2.0000 1178.1333 2.7455 0.0732
* * * Subgroup Means * * *
grand mean 87.8667
A1 89.6000
A2 84.9000
A3 89.1000
B1 95.1333
B2 80.6000
A1*B1 100.0000
A1*B2 79.2000
A2*B1 85.9000
A2*B2 83.9000
A3*B1 99.5000
A3*B2 78.7000
Example 3¶
This example performs a three-way analysis of variance using data discussed by Peter W.M. John (1971, pp. 91−92). The responses are weights (in grams) of roots of carrots grown with varying amounts of applied nitrogen (A), potassium (B), and phosphorus (C). Each cell of the three-way layout has one response. Note that the ABC interactions sum of squares, which is 186, is given incorrectly by Peter W.M. John (1971, Table 5.2.) The three-way layout is given in the following table:
\(A_0\) | \(A_1\) | \(A_2\) | |||||||
\(B_0\) | \(B_1\) | \(B_2\) | \(B_0\) | \(B_1\) | \(B_2\) | \(B_0\) | \(B_1\) | \(B_2\) | |
\(C_0\) | 88.76 | 91.41 | 97.85 | 94.83 | 100.49 | 99.75 | 99.90 | 100.23 | 104.51 |
\(C_1\) | 87.45 | 98.27 | 95.85 | 84.57 | 97.20 | 112.30 | 92.98 | 107.77 | 110.94 |
\(C_2\) | 86.01 | 104.20 | 90.09 | 81.06 | 120.80 | 108.77 | 94.72 | 118.39 | 102.87 |
from __future__ import print_function
from numpy import *
from pyimsl.stat.anovaFactorial import anovaFactorial
from pyimsl.stat.writeMatrix import writeMatrix
n_levels = [3, 3, 3]
y = [88.76, 87.45, 86.01, 91.41, 98.27, 104.2, 97.85, 95.85,
90.09, 94.83, 84.57, 81.06, 100.49, 97.2, 120.8, 99.75,
112.3, 108.77, 99.9, 92.98, 94.72, 100.23, 107.77, 118.39,
104.51, 110.94, 102.87]
labels = ["degrees of freedom for the model",
"degrees of freedom for error",
"total (corrected) degrees of freedom",
"sum of squares for the model",
"sum of squares for error",
"total (corrected) sum of squares",
"model mean square",
"error mean square",
"F-statistic", "p-value",
"R-squared (in percent)",
"Adjusted R-squared (in percent)",
"est. standard deviation of the model error",
"overall mean of y",
"coefficient of variation (in percent)"]
test_row_labels = ["A", "B", "C", "A*B", "A*C", "B*C"]
test_col_labels = ["Source", "DF", "Sum of\nSquares",
"Mean\nSquare", "Prob. of\nLarger F"]
# Perform analysis
anova_table = []
test_effects = []
p_value = anovaFactorial(n_levels, y,
anovaTable=anova_table,
testEffects=test_effects,
poolInteractions=True)
# Print results
print("P-value = %10.6f" % p_value)
writeMatrix("* * * Analysis of Variance * * *",
anova_table, rowLabels=labels, writeFormat="%11.4f", column=True)
writeMatrix("* * * Variation Due to the Model * * *",
test_effects, rowLabels=test_row_labels,
colLabels=test_col_labels, writeFormat="%11.4f")
Output¶
P-value = 0.008299
* * * Analysis of Variance * * *
degrees of freedom for the model 18.0000
degrees of freedom for error 8.0000
total (corrected) degrees of freedom 26.0000
sum of squares for the model 2395.7290
sum of squares for error 185.7763
total (corrected) sum of squares 2581.5052
model mean square 133.0961
error mean square 23.2220
F-statistic 5.7315
p-value 0.0083
R-squared (in percent) 92.8036
Adjusted R-squared (in percent) 76.6116
est. standard deviation of the model error 4.8189
overall mean of y 98.9619
coefficient of variation (in percent) 4.8695
* * * Variation Due to the Model * * *
Source DF Sum of Mean Prob. of
Squares Square Larger F
A 2.0000 488.3675 10.5152 0.0058
B 2.0000 1090.6564 23.4832 0.0004
C 2.0000 49.1485 1.0582 0.3911
A*B 4.0000 142.5853 1.5350 0.2804
A*C 4.0000 32.3474 0.3482 0.8383
B*C 4.0000 592.6238 6.3800 0.0131