anovaOneway¶

Analyzes a one-way classification model.

Synopsis¶

anovaOneway (n, y)

Required Arguments¶

int n[] (Input): Array of length nGroups containing the number of responses for each group.
float y[] (Input): Array of length n [0] + n [1] + … + n [nGroups − 1] containing the responses for each group.

Return Value¶

The p-value for the F-statistic.

Optional Arguments¶

anovaTable (Output)

An array of size 15 containing the analysis of variance table. The analysis of variance statistics are as follows:

Element	Analysis of Variance Statistics
0	Degrees of freedom for the model.
1	Degrees of freedom for error.
2	Total (corrected) degrees of freedom.
3	Sum of squares for the model.
4	Sum of squares for error.
5	Total (corrected) sum of squares.
6	Model mean square.
7	Error mean square.
8	Overall F-statistic.
9	p-value.
10	$R^2$ (in percent).
11	Adjusted $R^2$ (in percent).
12	Estimate of the standard deviation.
13	Overall mean of y.
14	Coefficient of variation (in percent).

Note that the p‑value is returned as 0.0 when the value is so small that all significant digits have been lost.

groupMeans (Output)

An array of length nGroups containing the group means.

groupStdDevs (Output)

An array of length nGroups containing the group standard deviations.

groupCounts (Output)

An array of length nGroups containing the number of nonmissing observations for the groups.

confidence, float (Input)

Confidence level for the simultaneous interval estimation.

If tukey is specified, confidence must be in the range [90.0, 99.0). Otherwise, confidence is in the range [0.0, 100.0).

Default: confidence = 95.0

tukey (Output)

or

dunnSidak (Output)

or

bonferroni (Output)

or

scheffe (Output)

or

oneAtATime (Output)

Function anovaOneway computes the confidence intervals on all pairwise differences of means using any one of six methods: Tukey, Tukey-Kramer, Dunn-Šidák, Bonferroni, Scheffé, or Fisher’s LSD (One-at-a-Time). If tukey is specified, the Tukey confidence intervals are calculated if the group sizes are equal; otherwise, the Tukey-Kramer confidence intervals are calculated.

On return, these keywords return an array of size

$\left(\frac{\mathrm{nGroup}}{2}\right) \times 5$

array containing the statistics relating to the difference of means.

Column	Description
0	Group number for the i-th mean.
1	Group number for the j-th mean.
2	Difference of means (i-th mean) − (j-th mean).
3	Lower confidence limit for the difference.
4	Upper confidence limit for the difference.

Description¶

Function anovaOneway performs an analysis of variance of responses from a oneway classification design. The model is

$y_{ij} = μ_i + ɛ_{ij} \phantom{...} i = 1, 2, \ldots, k; j = 1, 2, \ldots, n_i$

where the observed value $y_{ij}$ constitutes the j-th response in the i-th group, $\mu_i$ denotes the population mean for the i-th group, and the $\varepsilon_{ij}$ arguments are errors that are identically and independently distributed normal with mean 0 and variance $\sigma^2$ . Function anovaOneway requires the $y_{ij}$ observed responses as input into a single vector y with responses in each group occupying contiguous locations. The analysis of variance table is computed along with the group sample means and standard deviations. A discussion of formulas and interpretations for the one-way analysis of variance problem appears in most elementary statistics texts, e.g., Snedecor and Cochran (1967, Chapter 10).

Function anovaOneway computes simultaneous confidence intervals on all

$k^* = \frac{k(k-1)}{2}$

pairwise comparisons of k means $\mu_1,\mu_2,\ldots,\mu_k$ in the one-way analysis of variance model. Any of several methods can be chosen. A good review of these methods is given by Stoline (1981). The methods are also discussed in many elementary statistics texts, e.g., Kirk (1982, pp. 114−127).

Let $s^2$ be the estimated variance of a single observation. Let v be the degrees of freedom associated with $s^2$ . Let

$\alpha = 1 - \frac{\mathtt{confidence}}{100.0}$

The methods are summarized as follows:

Tukey method: The Tukey method gives the narrowest simultaneous confidence intervals for all pairwise differences of means $\mu_i-\mu_j$ in balanced ( $n_1=n_2=\ldots=n_k=n$ ) one-way designs. The method is exact and uses the Studentized range distribution. The formula for the difference $\mu_i-\mu_j$ is given by

$\overline{y}_i - \overline{y}_j \pm q_{1-\alpha;k, v \sqrt{\frac{s^2}{n}}}$

where $q_{1-a;k,v}$ is the ( $1-\alpha$ ) 100 percentage point of the Studentized range distribution with parameters k and v.

Tukey-Kramer method: The Tukey-Kramer method is an approximate extension of the Tukey method for the unbalanced case. (The method simplifies to the Tukey method for the balanced case.) The method always produces confidence intervals narrower than the Dunn-Šidák and Bonferroni methods. Hayter (1984) proved that the method is conservative, i.e., the method guarantees a confidence coverage of at least $(1−\alpha)100$ . Hayter’s proof gave further support to earlier recommendations for its use (Stoline 1981). (Methods that are currently better are restricted to special cases and only offer improvement in severely unbalanced cases; see, for example, Spurrier and Isham 1985.) The formula for the difference $\mu_i-\mu_j$ is given by the following:

$\overline{y}_i - \overline{y}_j \pm q_{1-\alpha;v,k \sqrt{\frac{s^2}{2n_i} + \frac{s^2}{2n_j}}}$

Dunn-Šidák method: The Dunn-Šidák method is a conservative method. The method gives wider intervals than the Tukey-Kramer method. (For large v and small α and k, the difference is only slight.) The method is slightly better than the Bonferroni method and is based on an improved Bonferroni (multiplicative) inequality (Miller 1980, pp. 101, 254−255). The method uses the t distribution (see function tInverseCdf, Chapter 11, Probability Distribution Functions and Inverses). The formula for the difference $\mu_i-\mu_j$ is given by

$\overline{y}_i - \overline{y}_j \pm t_{\tfrac{1}{2} + \tfrac{1}{2}(1-\alpha)^{1/k^*};v\sqrt{\frac{s^2}{n_i} + \frac{s^2}{n_j}}}$

where $t_{f ;v}$ is the 100f percentage point of the t distribution with ν degrees of freedom.

Bonferroni method: The Bonferroni method is a conservative method based on the Bonferroni (additive) inequality (Miller, p. 8). The method uses the t distribution. The formula for the difference $\mu_i-\mu_j$ is given by the following:

$\overline{y}_i - \overline{y}_j \pm t_{1-\frac{\alpha}{2k^*};v\sqrt{\frac{s^2}{n_i} + \frac{s^2}{n_j}}}$

Scheffé method: The Scheffé method is an overly conservative method for simultaneous confidence intervals on pairwise difference of means. The method is applicable for simultaneous confidence intervals on all contrasts, i.e., all linear combinations

$\sum_{i=1}^{k} c_i \mu_i$

where the following is true:

$\sum_{i=1}^{k} c_i = 0$

This method can be recommended here only if a large number of confidence intervals on contrasts in addition to the pairwise differences of means are to be constructed. The method uses the F distribution (see function fInverseCdf, Chapter 11, Probability Distribution Functions and Inverses). The formula for the difference $\mu_i-\mu_j$ is given by

$\overline{y}_i - \overline{y}_j \pm \sqrt{(k-1) F_{1-\alpha;k-1,v} \left(\frac{s^2}{n_i} + \frac{s^2}{n_j}\right)}$

where $F_{1-a; (k-1),v}$ is the $(1−\alpha)100$ percentage point of the F distribution with k − 1 and ν degrees of freedom.

One-at-a-Time t method (Fisher’s LSD): The One-at-a-Time t method is appropriate for constructing a single confidence interval. The confidence percentage input is appropriate for one interval at a time. The method has been used widely in conjunction with the overall test of the null hypothesis $\mu_1=\mu_2=\ldots=\mu_k$ by the use of the F statistic. Fisher’s LSD (least significant difference) test is a two-stage test that proceeds to make pairwise comparisons of means only if the overall F test is significant. Milliken and Johnson (1984, p. 31) recommend LSD comparisons after a significant F only if the number of comparisons is small and the comparisons were planned prior to the analysis. If many unplanned comparisons are made, they recommend Scheffé’s method. If the F test is insignificant, a few planned comparisons for differences in means can still be performed by using either Tukey, Tukey-Kramer, Dunn-Šidák,or Bonferroni methods. Because the F test is insignificant, Scheffé’s method does not yield any significant differences. The formula for the difference $\mu_i-\mu_j$ is given by the following:

$\overline{y}_i - \overline{y}_j \pm t_{1-\tfrac{\alpha}{2};v\sqrt{\frac{s^2}{n_i} + \frac{s^2}{n_j}}}$

Examples¶

Example 1¶

This example computes a one-way analysis of variance for data discussed by Searle (1971, Table 5.1, pp. 165−179). The responses are plant weights for six plants of three different types—three normal, two off-types, and one aberrant. The responses are given by type of plant in the following table:

Normal

Off-Type

Aberrant

101

84

32

105

94

88

from __future__ import print_function
from numpy import *
from pyimsl.stat.anovaOneway import anovaOneway

n = [3, 2, 1]
y = [101., 105., 94., 84., 88., 32.]
p_value = anovaOneway(n, y)
print("p-value = %6.4f" % p_value)

Output¶

p-value = 0.0028

Example 2¶

The data used in this example is the same as that used in the initial example. Here, the anovaTable is printed.

from numpy import *
from pyimsl.stat.anovaOneway import anovaOneway
from pyimsl.stat.writeMatrix import writeMatrix

n = [3, 2, 1]
y = [101., 105., 94., 84., 88., 32.]
labels = ["degrees of freedom for among groups",
          "degrees of freedom for within groups",
          "total (corrected) degrees of freedom",
          "sum of squares for among groups",
          "sum of squares for within groups",
          "total (corrected) sum of squares",
          "among mean square",
          "within mean square", "F-statistic",
          "p-value", "R-squared (in percent)",
          "adjusted R-squared (in percent)",
          "est. standard deviation of within error",
          "overall mean of y",
          "coefficient of variation (in percent)"]

# Perform analysis
anovaTable = []
p_value = anovaOneway(n, y, anovaTable=anovaTable)

# Print results
writeMatrix("* * * Analysis of Variance * * *\n",
            anovaTable, rowLabels=labels, writeFormat="%9.2f", column=True)

Output¶

 
         * * * Analysis of Variance * * *

degrees of freedom for among groups           2.00
degrees of freedom for within groups          3.00
total (corrected) degrees of freedom          5.00
sum of squares for among groups            3480.00
sum of squares for within groups             70.00
total (corrected) sum of squares           3550.00
among mean square                          1740.00
within mean square                           23.33
F-statistic                                  74.57
p-value                                       0.00
R-squared (in percent)                       98.03
adjusted R-squared (in percent)              96.71
est. standard deviation of within error       4.83
overall mean of y                            84.00
coefficient of variation (in percent)         5.75

Example 3¶

Simultaneous confidence intervals are generated for the following measurements of cold-cranking power for five models of automobile batteries. Nelson (1989, pp. 232−241) provided the data and approach.

Model 1	Model 2	Model 3	Model 4	Model 5
41	42	27	48	28
43	43	26	45	32
42	46	28	51	37
46	38	27	46	25

The Tukey method is chosen for the analysis of pairwise comparisons, with a confidence level of 99 percent. The means and their confidence limits are output.

from numpy import *
from pyimsl.stat.anovaOneway import anovaOneway
from pyimsl.stat.permuteMatrix import permuteMatrix, PERMUTE_COLUMNS
from pyimsl.stat.writeMatrix import writeMatrix

n_groups = 5
n = [4, 4, 4, 4, 4]
permute = [2, 3, 4, 0, 1]
y = [41.0, 43.0, 42.0, 46.0, 42.0,
     43.0, 46.0, 38.0, 27.0, 26.0,
     28.0, 27.0, 48.0, 45.0, 51.0,
     46.0, 28.0, 32.0, 37.0, 25.0]
labels = ["degrees of freedom for among groups",
          "degrees of freedom for within groups",
          "total (corrected) degrees of freedom",
          "sum of squares for among groups",
          "sum of squares for within groups",
          "total (corrected) sum of squares",
          "among mean square",
          "within mean square", "F-statistic",
          "p-value", "R-squared (in percent)",
          "adjusted R-squared (in percent)",
          "est. standard deviation of within error",
          "overall mean of y",
          "coefficient of variation (in percent)"]
mean_row_labels = ["first and second",
                   "first and third",
                   "first and fourth",
                   "first and fifth",
                   "second and third",
                   "second and fourth",
                   "second and fifth",
                   "third and fourth",
                   "third and fifth",
                   "fourth and fifth"]
mean_col_labels = ["Means",
                   "Difference of means",
                   "Lower limit",
                   "Upper limit", "", ""]

# Perform analysis
anovaTable = []
confidence = 99.0
ci_diff_means = []
p_value = anovaOneway(n, y, anovaTable=anovaTable,
                      confidence=confidence, tukey=ci_diff_means)

# Print anova table
writeMatrix("* * * Analysis of Variance * * *\n", anovaTable,
            rowLabels=labels, writeFormat="%9.2f", column=True)

# Permute ci_diff_means for printing
tmp_diff_means = permuteMatrix(ci_diff_means, permute,
                               PERMUTE_COLUMNS)

# Print ci_diff_means
writeMatrix("* * * Differences in Means * * *\n",
            tmp_diff_means[::, 0:3],
            rowLabels=mean_row_labels,
            colLabels=mean_col_labels,
            writeFormat="%9.2f")

Output¶

 
         * * * Analysis of Variance * * *

degrees of freedom for among groups           4.00
degrees of freedom for within groups         15.00
total (corrected) degrees of freedom         19.00
sum of squares for among groups            1242.20
sum of squares for within groups            150.75
total (corrected) sum of squares           1392.95
among mean square                           310.55
within mean square                           10.05
F-statistic                                  30.90
p-value                                       0.00
R-squared (in percent)                       89.18
adjusted R-squared (in percent)              86.29
est. standard deviation of within error       3.17
overall mean of y                            38.05
coefficient of variation (in percent)         8.33
 
           * * * Differences in Means * * *

Means              Difference  Lower limit  Upper limit
                     of means                          
first and second         0.75        -8.05         9.55
first and third         16.00         7.20        24.80
first and fourth        -4.50       -13.30         4.30
first and fifth         12.50         3.70        21.30
second and third        15.25         6.45        24.05
second and fourth       -5.25       -14.05         3.55
second and fifth        11.75         2.95        20.55
third and fourth       -20.50       -29.30       -11.70
third and fifth         -3.50       -12.30         5.30
fourth and fifth        17.00         8.20        25.80