anovaNested¶
Analyzes a completely nested random model with possibly unequal numbers in the subgroups.
Synopsis¶
anovaNested (nFactors, equalOption, nLevels, y)
Required Arguments¶
- int
nFactors
(Input) - Number of factors (number of subscripts) in the model, including error.
- int
equalOption
(Input) - Equal numbers option.
equalOption | Description |
---|---|
0 | Unequal numbers in the subgroups. |
1 | Equal numbers in the subgroups. |
- int
nLevels[]
(Input) Array with the number of levels for each factor.
If
equalOption
= 1,nLevels
is of lengthnFactors
and contains the number of levels for each of the factors.Example: Suppose there are 3 factors, A, B, and C. A has two levels (A1, A2), B has 3 levels at each level of A, and C has 2 levels at each level of B.
nLevels
= {2,3,2} and the number of observations isnObs
= 2 × 3 × 2 = 12.A levels B levels C levels y indices
1 1 1 0 2 1 2 1 2 2 3 3 1 4 2 5 2 1 1 6 2 7 2 1 8 2 9 3 1 10 2 11 nLevels
2 3 2 nObs
2× 3× 2 =12 equalOption
= 1 exampleIf
equalOption
= 0,nLevels
contains the number of levels of each factor at each level of the factor in which it is nested.Example: Suppose there are 3 factors, A, B, and C, with C nested in B and B nested in A. A has two levels (A1, A2), B has up to 3 levels, and C has up to 2 levels. In the
equalOption
= 0 case, the function needs to know explicitly how the number of levels varies throughout. As specified in the table, A has two levels (nLevels[
0]
= 2), B has 3 levels in level 1 of A (nLevels[
1]
= 3) and 2 levels in level 2 of A (nLevels[
2]
= 2). Similarly, factor C has 2 levels in the A1‑B1 and A1‑B2 combinations (nLevels[
3]
= 2,nLevels[4]
= 2), but only 1 level in the A1‑B3 combination (nLevels[5]
= 1).nLevels
= {2,3,2,2,2,1,1,2} and the number of observations is the sum of the number of levels in the last factor, C,nObs
= 2 + 2 + 1 + 1 + 2 = 8:
A levels | B levels | C levels | y
indices |
|
---|---|---|---|---|
1 | 1 | 1 | 0 | |
2 | 1 | |||
2 | 1 | 2 | ||
2 | 3 | |||
3 | 1 | 4 | ||
2 | 1 | 1 | 5 | |
2 | 1 | 6 | ||
2 | 7 | |||
nLevels |
2 | 3, 2 | 2 , 2, 1, 1, 2 | |
nObs |
2 + 2 + 1 + 1 + 2 | = 8 | ||
equalOption
= 0 example |
- float
y[]
(Input) - Array of length
nObs
containing the responses.
Return Value¶
The p-value for the F‑statistic, anovaTable[9]
.
Optional Arguments¶
anovaTable
(Output)An array of size 15 containing the analysis of variance table. The analysis of variance statistics are as follows:
Element Analysis of Variance Statistics 0 Degrees of freedom for the model. 1 Degrees of freedom for error. 2 Total (corrected) degrees of freedom. 3 Sum of squares for the model. 4 Sum of squares for error. 5 Total (corrected) sum of squares. 6 Model mean square. 7 Error mean square. 8 Overall F-statistic. 9 p-value. 10 \(R^2\) (in percent) 11 Adjusted \(R^2\) (in percent). 12 Estimate of the standard deviation. 13 Overall mean of y. 14 Coefficient of variation (in percent). Note that the p‑value is returned as 0.0 when the value is so small that all significant digits have been lost.
confidence
, float (Input)Confidence level for two-sided interval estimates on the variance components, in percent.
confidence
percent confidence intervals are computed, hence,confidence
must be in the interval[
0.0, 100.0).confidence
often will be 90.0, 95.0, or 99.0. For one-sided intervals with confidence levelONECL
,ONECL
in the interval[
50.0, 100.0), setconfidence
= 100.0 - 2.0 × (100.0 -ONECL
).Default: confidence = 95.0.
varianceComponents
(Output)An array.
varianceComponents
is annFactors
by 9 matrix containing statistics relating to the particular variance components in the model. Rows ofvarianceComponents
correspond to thenFactors
factors. Columns ofvarianceComponents
are as follows:Column Description 0 Degrees of freedom. 1 Sum of squares. 2 Mean squares. 3 F–statistic. 4 p-value for F test. 5 Variance component estimate. 6 Percent of variance of variance explained by variance component. 7 Lower endpoint for a confidence interval on the variance component. 8 Upper endpoint for a confidence interval on the variance component. A test for the error variance equal to zero cannot be performed.
varianceComponents
[(nFactors
-1)9+3]
andvarianceComponents
[(nFactors
-1)9+4]
are set to NaN (not a number). Note that the p-value for the F test is returned as 0.0 when the value is so small that all significant digits have been lost.ems
(Output)- An array of length
nFactors
(nFactors
+1) / 2 with expected mean square coefficients. yMeans
(Output)An array containing the subgroup means.
Equal options Length of yMeans
0 1 + sum of values in nLevels
for the first (nFactors-1
) factors1 1 + nLevels[
0]
+nLevels[
0]
nLevels[
1]
+ … +nLevels[
0]
nLevels[
1]
…nLevels
[nFactors
– 2]
.If the factors are labeled A, B, C, and error, the ordering of the means is grand mean, A means, AB means, and then ABC means.
Description¶
Function anovaNested
analyzes a nested random model with equal or
unequal numbers in the subgroups. The analysis includes an analysis of
variance table and computation of subgroup means and variance component
estimates. Anderson and Bancroft (1952, pages
325-330) discuss the methodology. The analysis of variance method is used
for estimating the variance components. This method solves a linear system
in which the mean squares are set to the expected mean squares. A problem
that Hocking (1985, pages 324-330) discusses
is that this method can yield negative variance component estimates. Hocking
suggests a diagnostic procedure for locating the cause of a negative
estimate. It may be necessary to reexamine the assumptions of the model.
Examples¶
Example 1¶
An analysis of a three-factor nested random model with equal numbers in the subgroups is performed using data discussed by Snedecor and Cochran (1967, Table 10.16.1, pages 285−288). The responses are calcium concentrations (in percent, dry basis) as measured in the leaves of turnip greens. Four plants are taken at random, then three leaves are randomly selected from each plant.
Finally, from each selected leaf two samples are taken to determine calcium concentration. The model is
where \(y_{ijk}\) is the calcium concentration for the k-th sample of the j-th leaf of the i-th plant, the \(\alpha_i\)’s are the plant effects and are taken to be independently distributed
the \(\beta_{ij}\)’s are leaf effects each independently distributed
and the \(\varepsilon_{ijk}\)’s are errors each independently distributed \(N(0,\sigma^2)\). The effects are all assumed to be independently distributed. The data are given in the following table:
Plant | Leaf | Samples | |
1 | 1 2 3 |
3.28 3.52 2.88 |
3.09 3.48 2.80 |
2 | 1 2 3 |
2.46 1.87 2.19 |
2.44 1.92 2.19 |
3 | 1 2 3 |
2.77 3.74 2.55 |
2.66 3.44 2.55 |
4 | 1 2 3 |
3.78 4.07 3.31 |
3.87 4.12 3.31 |
from __future__ import print_function
from numpy import *
from pyimsl.stat.anovaNested import anovaNested
from pyimsl.stat.writeMatrix import writeMatrix
y = [3.28, 3.09, 3.52, 3.48, 2.88, 2.80, 2.46, 2.44, 1.87,
1.92, 2.19, 2.19, 2.77, 2.66, 3.74, 3.44, 2.55, 2.55, 3.78,
3.87, 4.07, 4.12, 3.31, 3.31]
n_levels = [4, 3, 2]
aov_labels = ["degrees of freedom for model",
"degrees of freedom for error",
"total (corrected) degrees of freedom",
"sum of squares for model",
"sum of squares for error",
"total (corrected) sum of squares",
"model mean square",
"error mean square",
"F-statistic",
"p-value",
"R-squared (in percent)",
"adjusted R-squared (in percent)",
"est. standard deviation of within error",
"overall mean of y",
"coefficient of variation (in percent)"]
ems_labels = ["Effect A and Error",
"Effect A and Effect B",
"Effect A and Effect A",
"Effect B and Error",
"Effect B and Effect B",
"Error and Error"]
means_labels = ["Grand mean",
" A means 1",
" A means 2",
" A means 3",
" A means 4",
"AB means 1 1",
"AB means 1 2",
"AB means 1 3",
"AB means 2 1",
"AB means 2 2",
"AB means 2 3",
"AB means 3 1",
"AB means 3 2",
"AB means 3 3",
"AB means 4 1",
"AB means 4 2",
"AB means 4 3"]
components_col_labels = ["",
"Degrees of freedom",
"Sum of squares",
"Mean squares",
"F-statistic",
"p-value for F test",
"Variance component estimate",
"%var explained by var. comp.",
"confidence min for var. comp.",
"confidence max for var. comp."]
components_row_labels = ["A", "B", "Error"]
anovaTable = []
yMeans = []
varc = []
ems = []
pvalue = anovaNested(3, 1, n_levels, y,
anovaTable=anovaTable,
yMeans=yMeans,
varianceComponents=varc,
ems=ems)
print("pvalue = ", pvalue)
writeMatrix("* * * Analysis of Variance * * *", anovaTable,
rowLabels=aov_labels, writeFormat="%10.5f", column=True)
writeMatrix("* * * Expected Mean Square Coefficients * * *",
ems, rowLabels=ems_labels, writeFormat="%6.2f", column=True)
writeMatrix("* * * Means * * *", yMeans,
rowLabels=means_labels, writeFormat="%6.2f", column=True)
writeMatrix("* * Analysis of Variance / Variance Components * *",
varc, rowLabels=components_row_labels, colLabels=components_col_labels,
writeFormat="%10.5f")
Output¶
pvalue = 6.767042312122191e-11
* * * Analysis of Variance * * *
degrees of freedom for model 11.00000
degrees of freedom for error 12.00000
total (corrected) degrees of freedom 23.00000
sum of squares for model 10.19055
sum of squares for error 0.07985
total (corrected) sum of squares 10.27040
model mean square 0.92641
error mean square 0.00665
F-statistic 139.22303
p-value 0.00000
R-squared (in percent) 99.22252
adjusted R-squared (in percent) 98.50984
est. standard deviation of within error 0.08157
overall mean of y 3.01208
coefficient of variation (in percent) 2.70819
* * * Expected Mean Square Coefficients * * *
Effect A and Error 1.00
Effect A and Effect B 2.00
Effect A and Effect A 6.00
Effect B and Error 1.00
Effect B and Effect B 2.00
Error and Error 1.00
* * * Means * * *
Grand mean 3.01
A means 1 3.17
A means 2 2.18
A means 3 2.95
A means 4 3.74
AB means 1 1 3.18
AB means 1 2 3.50
AB means 1 3 2.84
AB means 2 1 2.45
AB means 2 2 1.90
AB means 2 3 2.19
AB means 3 1 2.71
AB means 3 2 3.59
AB means 3 3 2.55
AB means 4 1 3.83
AB means 4 2 4.10
AB means 4 3 3.31
* * Analysis of Variance / Variance Components * *
Degrees of Sum of squares Mean squares F-statistic p-value for
freedom F test
A 3.00000 7.56035 2.52012 7.66517 0.00973
B 8.00000 2.63020 0.32878 49.40889 0.00000
Error 12.00000 0.07985 0.00665 .......... ..........
Variance compon %var explained confidence confidence
ent estimate by var. comp. min for var. max for var.
comp. comp.
A 0.36522 68.53019 0.03955 5.78674
B 0.16106 30.22123 0.06967 0.60042
Error 0.00665 1.24858 0.00342 0.01813
Example 2¶
An analysis of a three-factor nested random model with unequal numbers in the subgroups is performed. The data are given in the following table:
A | B | C | ||
1 | 1 2 |
23.0 31.0 |
19.0 37.0 |
|
2 | 1 2 |
33.0 29.0 |
29.0 | |
3 | 1 | 36.0 | 29.0 | 33.0 |
4 | 1 2 3 4 5 6 7 8 9 |
11.0 23.0 33.0 23.0 26.0 39.0 20.0 24.0 36.0 |
21.0 18.0 |
|
5 | 1 | 25.0 | 33.0 | |
6 | 1 2 3 4 5 6 7 8 9 10 |
28.0 25.0 32.0 41.0 35.0 16.0 30.0 40.0 32.0 44.0 |
31.0 42.0 36.0 |
from numpy import *
from pyimsl.stat.anovaNested import anovaNested
from pyimsl.stat.writeMatrix import writeMatrix
y = [23.0, 19.0, 31.0, 37.0,
33.0, 29.0, 29.0,
36.0, 29.0, 33.0,
11.0, 21.0,
23.0, 18.0,
33.0, 23.0, 26.0, 39.0, 20.0, 24.0, 36.0,
25.0, 33.0,
28.0, 31.0,
25.0, 42.0,
32.0, 36.0,
41.0, 35.0, 16.0, 30.0, 40.0, 32.0, 44.0]
nl = [6, # Factor A
2, 2, 1, 9, 1, 10, # Factor B
2, 2, # Factor C
2, 1,
3,
2, 2, 1, 1, 1, 1, 1, 1, 1,
2,
2, 2, 2, 1, 1, 1, 1, 1, 1, 1]
aov_labels = ["degrees of freedom for model",
"degrees of freedom for error",
"total (corrected) degrees of freedom",
"sum of squares for model",
"sum of squares for error",
"total (corrected) sum of squares",
"model mean square",
"error mean square",
"F-statistic",
"p-value",
"R-squared (in percent)",
"adjusted R-squared (in percent)",
"est. standard deviation of within error",
"overall mean of y",
"coefficient of variation (in percent)"]
ems_labels = ["Effect A and Error",
"Effect A and Effect B",
"Effect A and Effect A",
"Effect B and Error",
"Effect B and Effect B",
"Error and Error"]
means_labels = [
"Grand mean", " A means 1", " A means 2",
" A means 3", " A means 4", " A means 5",
" A means 6", "AB means 1 1", "AB means 1 2",
"AB means 2 1", "AB means 2 2", "AB means 3 1",
"AB means 4 1", "AB means 4 2", "AB means 4 3",
"AB means 4 4", "AB means 4 5", "AB means 4 6",
"AB means 4 7", "AB means 4 8", "AB means 4 9",
"AB means 5 1", "AB means 6 1", "AB means 6 2",
"AB means 6 3", "AB means 6 4", "AB means 6 5",
"AB means 6 6", "AB means 6 7", "AB means 6 8",
"AB means 6 9", "AB means 6 10"]
components_labels = [
"degrees of freedom for A", "sum of squares for A",
"mean square of A", "F-statistic for A", "p-value for A",
"Estimate of A", "Percent Variation Explained by A",
"95% Confidence Interval Lower Limit for A",
"95% Confidence Interval Upper Limit for A",
"degrees of freedom for B", "sum of squares for B",
"mean square of B", "F-statistic for B", "p-value for B",
"Estimate of B", "Percent Variation Explained by B",
"95% Confidence Interval Lower Limit for B",
"95% Confidence Interval Upper Limit for B",
"degrees of freedom for Error", "sum of squares for Error",
"mean square of Error", "F-statistic for Error",
"p-value for Error", "Estimate of Error",
"Percent Explained by Error",
"95% Confidence Interval Lower Limit for Error",
"95% Confidence Interval Upper Limit for Error"]
anovaTable = []
yMeans = []
varc = []
ems = []
pvalue = anovaNested(3, 0, nl, y,
anovaTable=anovaTable,
yMeans=yMeans,
varianceComponents=varc,
ems=ems)
writeMatrix("* * * Analysis of Variance * * *", anovaTable,
rowLabels=aov_labels, writeFormat="%10.5f", column=True)
writeMatrix("* * * Expected Mean Square Coefficients * * *",
ems, rowLabels=ems_labels, writeFormat="%6.2f", column=True)
writeMatrix("* * * Means * * *", yMeans,
rowLabels=means_labels, writeFormat="%6.2f", column=True)
pvarc = reshape(varc, (27))
writeMatrix("* * Analysis of Variance / Variance Components * *",
pvarc, rowLabels=components_labels, transpose=True,
writeFormat="%10.5f")
Output¶
* * * Analysis of Variance * * *
degrees of freedom for model 24.00000
degrees of freedom for error 11.00000
total (corrected) degrees of freedom 35.00000
sum of squares for model 1810.80556
sum of squares for error 310.16667
total (corrected) sum of squares 2120.97222
model mean square 75.45023
error mean square 28.19697
F-statistic 2.67583
p-value 0.04587
R-squared (in percent) 85.37620
adjusted R-squared (in percent) 53.46974
est. standard deviation of within error 5.31008
overall mean of y 29.52778
coefficient of variation (in percent) 17.98334
* * * Expected Mean Square Coefficients * * *
Effect A and Error 1.00
Effect A and Effect B 1.97
Effect A and Effect A 5.38
Effect B and Error 1.00
Effect B and Effect B 1.29
Error and Error 1.00
* * * Means * * *
Grand mean 29.53
A means 1 27.50
A means 2 30.33
A means 3 32.67
A means 4 24.91
A means 5 29.00
A means 6 33.23
AB means 1 1 21.00
AB means 1 2 34.00
AB means 2 1 31.00
AB means 2 2 29.00
AB means 3 1 32.67
AB means 4 1 16.00
AB means 4 2 20.50
AB means 4 3 33.00
AB means 4 4 23.00
AB means 4 5 26.00
AB means 4 6 39.00
AB means 4 7 20.00
AB means 4 8 24.00
AB means 4 9 36.00
AB means 5 1 29.00
AB means 6 1 29.50
AB means 6 2 33.50
AB means 6 3 34.00
AB means 6 4 41.00
AB means 6 5 35.00
AB means 6 6 16.00
AB means 6 7 30.00
AB means 6 8 40.00
AB means 6 9 32.00
AB means 6 10 44.00
* * Analysis of Variance / Variance Components * *
1
degrees of freedom for A 5.00000
sum of squares for A 461.42211
mean square of A 92.28442
F-statistic for A 0.98770
p-value for A 0.46007
Estimate of A -0.21372
Percent Variation Explained by A ..........
95% Confidence Interval Lower Limit for A ..........
95% Confidence Interval Upper Limit for A ..........
degrees of freedom for B 19.00000
sum of squares for B 1349.38345
mean square of B 71.02018
F-statistic for B 2.51872
p-value for B 0.05966
Estimate of B 33.19878
Percent Variation Explained by B 54.07342
95% Confidence Interval Lower Limit for B 0.00000
95% Confidence Interval Upper Limit for B 100.58638
degrees of freedom for Error 11.00000
sum of squares for Error 310.16667
mean square of Error 28.19697
F-statistic for Error ..........
p-value for Error ..........
Estimate of Error 28.19697
Percent Explained by Error 45.92658
95% Confidence Interval Lower Limit for Error 14.14991
95% Confidence Interval Upper Limit for Error 81.28594