anovaNested¶

Analyzes a completely nested random model with possibly unequal numbers in the subgroups.

Synopsis¶

anovaNested (nFactors, equalOption, nLevels, y)

Required Arguments¶

int nFactors (Input): Number of factors (number of subscripts) in the model, including error.
int equalOption (Input): Equal numbers option.

equalOption	Description
0	Unequal numbers in the subgroups.
1	Equal numbers in the subgroups.

int nLevels[] (Input)

Array with the number of levels for each factor.

If equalOption = 1, nLevels is of length nFactors and contains the number of levels for each of the factors.

Example: Suppose there are 3 factors, A, B, and C. A has two levels (A1, A2), B has 3 levels at each level of A, and C has 2 levels at each level of B. nLevels = {2,3,2} and the number of observations is nObs = 2 × 3 × 2 = 12.

	A levels	B levels	C levels	`y indices`
	1	1	1	0
			2	1
		2	1	2
			2	3
		3	1	4
			2	5
	2	1	1	6
			2	7
		2	1	8
			2	9
		3	1	10
			2	11

`nLevels`	2	3	2
`nObs`	2×	3×	2	=12
	`equalOption` = 1 example

If equalOption = 0, nLevels contains the number of levels of each factor at each level of the factor in which it is nested.

Example: Suppose there are 3 factors, A, B, and C, with C nested in B and B nested in A. A has two levels (A1, A2), B has up to 3 levels, and C has up to 2 levels. In the equalOption = 0 case, the function needs to know explicitly how the number of levels varies throughout. As specified in the table, A has two levels (nLevels[0] = 2), B has 3 levels in level 1 of A (nLevels[1] = 3) and 2 levels in level 2 of A (nLevels[2] = 2). Similarly, factor C has 2 levels in the A1‑B1 and A1‑B2 combinations (nLevels[3] = 2, nLevels[4] = 2), but only 1 level in the A1‑B3 combination (nLevels[5] = 1). nLevels = {2,3,2,2,2,1,1,2} and the number of observations is the sum of the number of levels in the last factor, C, nObs = 2 + 2 + 1 + 1 + 2 = 8:

	A levels	B levels	C levels	`y indices`
	1	1	1	0
			2	1
		2	1	2
			2	3
		3	1	4
	2	1	1	5
		2	1	6
			2	7

`nLevels`	2	3, 2	2 , 2, 1, 1, 2
`nObs`			2 + 2 + 1 + 1 + 2	= 8
	`equalOption` = 0 example

float y[] (Input): Array of length nObs containing the responses.

Return Value¶

The p-value for the F‑statistic, anovaTable[9].

Optional Arguments¶

anovaTable (Output)

An array of size 15 containing the analysis of variance table. The analysis of variance statistics are as follows:

Element	Analysis of Variance Statistics
0	Degrees of freedom for the model.
1	Degrees of freedom for error.
2	Total (corrected) degrees of freedom.
3	Sum of squares for the model.
4	Sum of squares for error.
5	Total (corrected) sum of squares.
6	Model mean square.
7	Error mean square.
8	Overall F-statistic.
9	p-value.
10	$R^2$ (in percent)
11	Adjusted $R^2$ (in percent).
12	Estimate of the standard deviation.
13	Overall mean of y.
14	Coefficient of variation (in percent).

Note that the p‑value is returned as 0.0 when the value is so small that all significant digits have been lost.

confidence, float (Input)

Confidence level for two-sided interval estimates on the variance components, in percent. confidence percent confidence intervals are computed, hence, confidence must be in the interval [0.0, 100.0). confidence often will be 90.0, 95.0, or 99.0. For one-sided intervals with confidence level ONECL, ONECL in the interval [50.0, 100.0), set confidence = 100.0 - 2.0 × (100.0 - ONECL).

Default: confidence = 95.0.

varianceComponents (Output)

An array. varianceComponents is an nFactors by 9 matrix containing statistics relating to the particular variance components in the model. Rows of varianceComponents correspond to the nFactors factors. Columns of varianceComponents are as follows:

Column	Description
0	Degrees of freedom.
1	Sum of squares.
2	Mean squares.
3	F–statistic.
4	p-value for F test.
5	Variance component estimate.
6	Percent of variance of variance explained by variance component.
7	Lower endpoint for a confidence interval on the variance component.
8	Upper endpoint for a confidence interval on the variance component.

A test for the error variance equal to zero cannot be performed. varianceComponents [(nFactors-1)9+3] and varianceComponents [(nFactors-1)9+4] are set to NaN (not a number). Note that the p-value for the F test is returned as 0.0 when the value is so small that all significant digits have been lost.

ems (Output)

An array of length nFactors (nFactors +1) / 2 with expected mean square coefficients.

yMeans (Output)

An array containing the subgroup means.

Equal options	Length of `yMeans`
0	1 + sum of values in `nLevels` for the first (`nFactors-1`) factors
1	1 + `nLevels[`0`]` + `nLevels[`0`]` `nLevels[`1`]` + … + `nLevels[`0`]` `nLevels[`1`]` … `nLevels`[`nFactors` – 2`]`.

If the factors are labeled A, B, C, and error, the ordering of the means is grand mean, A means, AB means, and then ABC means.

Description¶

Function anovaNested analyzes a nested random model with equal or unequal numbers in the subgroups. The analysis includes an analysis of variance table and computation of subgroup means and variance component estimates. Anderson and Bancroft (1952, pages 325-330) discuss the methodology. The analysis of variance method is used for estimating the variance components. This method solves a linear system in which the mean squares are set to the expected mean squares. A problem that Hocking (1985, pages 324-330) discusses is that this method can yield negative variance component estimates. Hocking suggests a diagnostic procedure for locating the cause of a negative estimate. It may be necessary to reexamine the assumptions of the model.

Examples¶

Example 1¶

An analysis of a three-factor nested random model with equal numbers in the subgroups is performed using data discussed by Snedecor and Cochran (1967, Table 10.16.1, pages 285−288). The responses are calcium concentrations (in percent, dry basis) as measured in the leaves of turnip greens. Four plants are taken at random, then three leaves are randomly selected from each plant.

Finally, from each selected leaf two samples are taken to determine calcium concentration. The model is

$y_{ijk} = μ + α_i + β_{ij} + e_{ijk} i = 1, 2, 3, 4; j = 1, 2, 3; k = 1, 2$

where $y_{ijk}$ is the calcium concentration for the k-th sample of the j-th leaf of the i-th plant, the $\alpha_i$ ’s are the plant effects and are taken to be independently distributed

$N\left(0, \sigma^2\right)$

the $\beta_{ij}$ ’s are leaf effects each independently distributed

$N\left(0, \sigma_{\beta}^2\right)$

and the $\varepsilon_{ijk}$ ’s are errors each independently distributed $N(0,\sigma^2)$ . The effects are all assumed to be independently distributed. The data are given in the following table:

Plant

Leaf

Samples

1

2

3

3.28

3.52

2.88

3.09

3.48

2.80

2

1

2

3

2.46

1.87

2.19

2.44

1.92

2.19

3

1

2

3

2.77

3.74

2.55

2.66

3.44

2.55

4

1

2

3

3.78

4.07

3.31

3.87

4.12

3.31

from __future__ import print_function
from numpy import *
from pyimsl.stat.anovaNested import anovaNested
from pyimsl.stat.writeMatrix import writeMatrix

y = [3.28, 3.09, 3.52, 3.48, 2.88, 2.80, 2.46, 2.44, 1.87,
     1.92, 2.19, 2.19, 2.77, 2.66, 3.74, 3.44, 2.55, 2.55, 3.78,
     3.87, 4.07, 4.12, 3.31, 3.31]
n_levels = [4, 3, 2]
aov_labels = ["degrees of freedom for model",
              "degrees of freedom for error",
              "total (corrected) degrees of freedom",
              "sum of squares for model",
              "sum of squares for error",
              "total (corrected) sum of squares",
              "model mean square",
              "error mean square",
              "F-statistic",
              "p-value",
              "R-squared (in percent)",
              "adjusted R-squared (in percent)",
              "est. standard deviation of within error",
              "overall mean of y",
              "coefficient of variation (in percent)"]
ems_labels = ["Effect A and Error",
              "Effect A and Effect B",
              "Effect A and Effect A",
              "Effect B and Error",
              "Effect B and Effect B",
              "Error and Error"]
means_labels = ["Grand mean",
                " A means 1",
                " A means 2",
                " A means 3",
                " A means 4",
                "AB means 1 1",
                "AB means 1 2",
                "AB means 1 3",
                "AB means 2 1",
                "AB means 2 2",
                "AB means 2 3",
                "AB means 3 1",
                "AB means 3 2",
                "AB means 3 3",
                "AB means 4 1",
                "AB means 4 2",
                "AB means 4 3"]
components_col_labels = ["",
                         "Degrees of freedom",
                         "Sum of squares",
                         "Mean squares",
                         "F-statistic",
                         "p-value for F test",
                         "Variance component estimate",
                         "%var explained by var. comp.",
                         "confidence min for var. comp.",
                         "confidence max for var. comp."]
components_row_labels = ["A", "B", "Error"]
anovaTable = []
yMeans = []
varc = []
ems = []

pvalue = anovaNested(3, 1, n_levels, y,
                     anovaTable=anovaTable,
                     yMeans=yMeans,
                     varianceComponents=varc,
                     ems=ems)

print("pvalue = ", pvalue)
writeMatrix("* * * Analysis of Variance * * *", anovaTable,
            rowLabels=aov_labels, writeFormat="%10.5f", column=True)
writeMatrix("* * * Expected Mean Square Coefficients * * *",
            ems, rowLabels=ems_labels, writeFormat="%6.2f", column=True)
writeMatrix("* * * Means * * *", yMeans,
            rowLabels=means_labels, writeFormat="%6.2f", column=True)
writeMatrix("* * Analysis of Variance / Variance Components * *",
            varc, rowLabels=components_row_labels, colLabels=components_col_labels,
            writeFormat="%10.5f")

Output¶

pvalue =  6.767042312122191e-11
 
         * * * Analysis of Variance * * *
degrees of freedom for model               11.00000
degrees of freedom for error               12.00000
total (corrected) degrees of freedom       23.00000
sum of squares for model                   10.19055
sum of squares for error                    0.07985
total (corrected) sum of squares           10.27040
model mean square                           0.92641
error mean square                           0.00665
F-statistic                               139.22303
p-value                                     0.00000
R-squared (in percent)                     99.22252
adjusted R-squared (in percent)            98.50984
est. standard deviation of within error     0.08157
overall mean of y                           3.01208
coefficient of variation (in percent)       2.70819
 
* * * Expected Mean Square Coefficients * * *
        Effect A and Error       1.00
        Effect A and Effect B    2.00
        Effect A and Effect A    6.00
        Effect B and Error       1.00
        Effect B and Effect B    2.00
        Error and Error          1.00
 
  * * * Means * * *
Grand mean      3.01
 A means 1      3.17
 A means 2      2.18
 A means 3      2.95
 A means 4      3.74
AB means 1 1    3.18
AB means 1 2    3.50
AB means 1 3    2.84
AB means 2 1    2.45
AB means 2 2    1.90
AB means 2 3    2.19
AB means 3 1    2.71
AB means 3 2    3.59
AB means 3 3    2.55
AB means 4 1    3.83
AB means 4 2    4.10
AB means 4 3    3.31
 
           * * Analysis of Variance / Variance Components * *
       Degrees of  Sum of squares  Mean squares  F-statistic  p-value for
          freedom                                                  F test
A         3.00000         7.56035       2.52012      7.66517      0.00973
B         8.00000         2.63020       0.32878     49.40889      0.00000
Error    12.00000         0.07985       0.00665   ..........   ..........
 
       Variance compon  %var explained    confidence    confidence
          ent estimate   by var. comp.  min for var.  max for var.
                                               comp.         comp.
A              0.36522        68.53019       0.03955       5.78674
B              0.16106        30.22123       0.06967       0.60042
Error          0.00665         1.24858       0.00342       0.01813

Example 2¶

An analysis of a three-factor nested random model with unequal numbers in the subgroups is performed. The data are given in the following table:

A	B	C
1	1 2	23.0 31.0	19.0 37.0
2	1 2	33.0 29.0	29.0
3	1	36.0	29.0	33.0
4	1 2 3 4 5 6 7 8 9	11.0 23.0 33.0 23.0 26.0 39.0 20.0 24.0 36.0	21.0 18.0
5	1	25.0	33.0
6	1 2 3 4 5 6 7 8 9 10	28.0 25.0 32.0 41.0 35.0 16.0 30.0 40.0 32.0 44.0	31.0 42.0 36.0

from numpy import *
from pyimsl.stat.anovaNested import anovaNested
from pyimsl.stat.writeMatrix import writeMatrix

y = [23.0, 19.0, 31.0, 37.0,
     33.0, 29.0, 29.0,
     36.0, 29.0, 33.0,
     11.0, 21.0,
     23.0, 18.0,
     33.0, 23.0, 26.0, 39.0, 20.0, 24.0, 36.0,
     25.0, 33.0,
     28.0, 31.0,
     25.0, 42.0,
     32.0, 36.0,
     41.0, 35.0, 16.0, 30.0, 40.0, 32.0, 44.0]

nl = [6,                  # Factor A
      2, 2, 1, 9, 1, 10,  # Factor B
      2, 2,               # Factor C
      2, 1,
      3,
      2, 2, 1, 1, 1, 1, 1, 1, 1,
      2,
      2, 2, 2, 1, 1, 1, 1, 1, 1, 1]

aov_labels = ["degrees of freedom for model",
              "degrees of freedom for error",
              "total (corrected) degrees of freedom",
              "sum of squares for model",
              "sum of squares for error",
              "total (corrected) sum of squares",
              "model mean square",
              "error mean square",
              "F-statistic",
              "p-value",
              "R-squared (in percent)",
              "adjusted R-squared (in percent)",
              "est. standard deviation of within error",
              "overall mean of y",
              "coefficient of variation (in percent)"]
ems_labels = ["Effect A and Error",
              "Effect A and Effect B",
              "Effect A and Effect A",
              "Effect B and Error",
              "Effect B and Effect B",
              "Error and Error"]
means_labels = [
    "Grand mean", " A means 1", " A means 2",
    " A means 3", " A means 4", " A means 5",
    " A means 6", "AB means 1 1", "AB means 1 2",
    "AB means 2 1", "AB means 2 2", "AB means 3 1",
    "AB means 4 1", "AB means 4 2", "AB means 4 3",
    "AB means 4 4", "AB means 4 5", "AB means 4 6",
    "AB means 4 7", "AB means 4 8", "AB means 4 9",
    "AB means 5 1", "AB means 6 1", "AB means 6 2",
    "AB means 6 3", "AB means 6 4", "AB means 6 5",
    "AB means 6 6", "AB means 6 7", "AB means 6 8",
    "AB means 6 9", "AB means 6 10"]
components_labels = [
    "degrees of freedom for A", "sum of squares for A",
    "mean square of A", "F-statistic for A", "p-value for A",
    "Estimate of A", "Percent Variation Explained by A",
    "95% Confidence Interval Lower Limit for A",
    "95% Confidence Interval Upper Limit for A",
    "degrees of freedom for B", "sum of squares for B",
    "mean square of B", "F-statistic for B", "p-value for B",
    "Estimate of B", "Percent Variation Explained by B",
    "95% Confidence Interval Lower Limit for B",
    "95% Confidence Interval Upper Limit for B",
    "degrees of freedom for Error", "sum of squares for Error",
    "mean square of Error", "F-statistic for Error",
    "p-value for Error", "Estimate of Error",
    "Percent Explained by Error",
    "95% Confidence Interval Lower Limit for Error",
    "95% Confidence Interval Upper Limit for Error"]

anovaTable = []
yMeans = []
varc = []
ems = []
pvalue = anovaNested(3, 0, nl, y,
                     anovaTable=anovaTable,
                     yMeans=yMeans,
                     varianceComponents=varc,
                     ems=ems)

writeMatrix("* * * Analysis of Variance * * *", anovaTable,
            rowLabels=aov_labels, writeFormat="%10.5f", column=True)
writeMatrix("* * * Expected Mean Square Coefficients * * *",
            ems, rowLabels=ems_labels, writeFormat="%6.2f", column=True)
writeMatrix("* * * Means * * *", yMeans,
            rowLabels=means_labels, writeFormat="%6.2f", column=True)
pvarc = reshape(varc, (27))
writeMatrix("* * Analysis of Variance / Variance Components * *",
            pvarc, rowLabels=components_labels, transpose=True,
            writeFormat="%10.5f")

Output¶

 
         * * * Analysis of Variance * * *
degrees of freedom for model               24.00000
degrees of freedom for error               11.00000
total (corrected) degrees of freedom       35.00000
sum of squares for model                 1810.80556
sum of squares for error                  310.16667
total (corrected) sum of squares         2120.97222
model mean square                          75.45023
error mean square                          28.19697
F-statistic                                 2.67583
p-value                                     0.04587
R-squared (in percent)                     85.37620
adjusted R-squared (in percent)            53.46974
est. standard deviation of within error     5.31008
overall mean of y                          29.52778
coefficient of variation (in percent)      17.98334
 
* * * Expected Mean Square Coefficients * * *
        Effect A and Error       1.00
        Effect A and Effect B    1.97
        Effect A and Effect A    5.38
        Effect B and Error       1.00
        Effect B and Effect B    1.29
        Error and Error          1.00
 
  * * * Means * * *
Grand mean      29.53
 A means 1      27.50
 A means 2      30.33
 A means 3      32.67
 A means 4      24.91
 A means 5      29.00
 A means 6      33.23
AB means 1 1    21.00
AB means 1 2    34.00
AB means 2 1    31.00
AB means 2 2    29.00
AB means 3 1    32.67
AB means 4 1    16.00
AB means 4 2    20.50
AB means 4 3    33.00
AB means 4 4    23.00
AB means 4 5    26.00
AB means 4 6    39.00
AB means 4 7    20.00
AB means 4 8    24.00
AB means 4 9    36.00
AB means 5 1    29.00
AB means 6 1    29.50
AB means 6 2    33.50
AB means 6 3    34.00
AB means 6 4    41.00
AB means 6 5    35.00
AB means 6 6    16.00
AB means 6 7    30.00
AB means 6 8    40.00
AB means 6 9    32.00
AB means 6 10   44.00
 
   * * Analysis of Variance / Variance Components * *
                                                        1
degrees of freedom for A                          5.00000
sum of squares for A                            461.42211
mean square of A                                 92.28442
F-statistic for A                                 0.98770
p-value for A                                     0.46007
Estimate of A                                    -0.21372
Percent Variation Explained by A               ..........
95% Confidence Interval Lower Limit for A      ..........
95% Confidence Interval Upper Limit for A      ..........
degrees of freedom for B                         19.00000
sum of squares for B                           1349.38345
mean square of B                                 71.02018
F-statistic for B                                 2.51872
p-value for B                                     0.05966
Estimate of B                                    33.19878
Percent Variation Explained by B                 54.07342
95% Confidence Interval Lower Limit for B         0.00000
95% Confidence Interval Upper Limit for B       100.58638
degrees of freedom for Error                     11.00000
sum of squares for Error                        310.16667
mean square of Error                             28.19697
F-statistic for Error                          ..........
p-value for Error                              ..........
Estimate of Error                                28.19697
Percent Explained by Error                       45.92658
95% Confidence Interval Lower Limit for Error    14.14991
95% Confidence Interval Upper Limit for Error    81.28594