homogeneity¶
Conducts Bartlett’s and Levene’s tests of the homogeneity of variance assumption in analysis of variance.
Synopsis¶
homogeneity (nTreatment, treatment, y)
Required Arguments¶
- int
nTreatment
(Input) - Number of treatments.
nTreatment
must be greater than one. - int
treatment[]
(Input) - An array of length
n
containing the treatment identifiers for each observation iny
. Each level of the treatment must be assigned a different integer.homogeneity
verifies that the number of unique treatment identifiers is equal tonTreatment
. - float
y[]
(Input) - An array of length
n
containing the experimental observations and any missing values. Missing values can be included in this array, although they are ignored in the analysis. They are indicated by placing a NaN (not a number) iny
. The NaN value can be set using the functionmachine
(6).
Return Value¶
An array of length 2 containing the p-values for Bartletts and Levene’s tests.
Optional Arguments¶
levenesMean
(Input)
or
levenesMedian
(Input)Calculates Levene’s test using either the treatment means or medians.
levenesMean
indicates that Levene’s test is calculated using the mean, andlevenesMedian
indicates that it is calculated using the median.Default:
levenesMean
nMissing
(Output)- Number of missing values, if any, found in
y
. Missing values are denoted with a NaN (Not a Number) value iny
. In these analyses, any missing values are ignored. cv
(Output)- The coefficient of variation computed using the grand mean and pooled within treatment standard deviation.
grandMean
(Output)- Mean of all the data across every location.
treatmentMeans
(Output)- An array of size
nTreatment
containing the treatment means. residuals
(Output)- An array of length
n
containing the residuals for non-missing observations. The ordering of the values in this array corresponds to the ordering of values iny
and identified by the values intreatments.
studentizedResiduals
(Output)- An array of length
n
containing the studentized residuals for non-missing observations. The ordering of the values in this array corresponds to the ordering of values iny
and identified by the values intreatments
. stdDevS
(Output)- An array of length
nTreatment
containing the treatment standard deviations. bartletts
(Output)- Test statistic for Bartlett’s test.
levenes
(Output)- Test statistic for Levene’s test.
Description¶
Traditional analysis of variance assumes that variances within treatments
are equal. This is referred to as homogeneity of variance. The function
homogeneity
conducts both the Bartlett’s and Levene’s tests for this
assumption:
versus
for at least one pair (\(i\neq j\)), where t=nTreatments
.
Bartlett’s test, Bartlett (1937), uses the test statistic:
where
and \(S_i^2 {\mathit{S}}_{\mathit{i}}^{2}\) is the variance of the \(n_i\) non-missing observations in the i‑th treatment. \(S_p^2\) is referred to as the pooled variance, and it is also known as the error mean squares from a 1‑way analysis of variance.
If the usual assumptions associated with the analysis of variance are valid, then Bartlett’s test statistic is a chi-squared random variable with degrees of freedom equal to \(t-1\).
The original Levene’s test, Levene (1960) and Snedecor & Cochran (1967), uses a different test statistic, \(F_0\), equal to:
where
\(x_{ij}\) is the j‑th observation from the i‑th treatment and
\(\overline{x}_{i.}\) is the mean for the i‑th treatment. Conover,
Johnson, and Johnson (1981) compared over 50 similar tests for homogeneity
and concluded that one of the best tests was Levene’s test when the
treatment mean, \(\overline{x}_{i.}\) is replaced with the treatment
median, \(\tilde{x}_{i.}\). This version of Levene’s test can be
requested by setting levenesMedian
. In either case, Levene’s test
statistic is treated as a F random variable with numerator degrees of
freedom equal to (t-1) and denominator degrees of freedom (N-t).
The residual for the j-th observation within the i‑th treatment,
\(e_{ij}\), returned from residuals
is unstandardized, i.e.
\(e_{ij}=x_{ij}-\overline{x}_i\). For investigating problems of
homogeneity of variance, the studentized residuals returned by
studentizedResiduals
are recommended since they are standardized by the
standard deviation of the residual. The formula for calculating the
studentized residual is:
where the coefficient of variation, returned from cv
, is also calculated
using the pooled variance and the grand mean:
Example¶
This example applies Bartlett’s and Levene’s test to verify the homogeneity assumption for a one-way analysis of variance. There are eight treatments, each with 3 replicates for a total of 24 observations. The estimated treatment standard deviations range from 5.35 to 13.17.
In this case, Bartlett’s test is not statistically significant for a stated significance level of 0.05; whereas Levene’s test is significant with \(p=0.006\).
from __future__ import print_function
from numpy import *
from pyimsl.stat.homogeneity import homogeneity
from pyimsl.stat.writeMatrix import writeMatrix
page_width = 132
n = 24
n_treatment = 8
treatment = [1, 2, 3, 4, 5, 6, 7, 8,
1, 2, 3, 4, 5, 6, 7, 8,
1, 2, 3, 4, 5, 6, 7, 8]
y = [30.0, 40.0, 38.9, 38.2, 41.8, 52.2, 54.8, 58.2,
20.5, 26.9, 21.4, 25.1, 26.4, 36.7, 28.9, 35.9,
21.0, 25.4, 24.0, 23.3, 34.4, 41.0, 33.0, 34.9]
bartletts = []
levenes = []
grand_mean = []
cv = []
treatment_means = []
residuals = []
studentized_residuals = []
std_devs = []
n_missing = []
p = homogeneity(n_treatment, treatment, y,
bartletts=bartletts, levenes=levenes,
levenesMedian=True,
nMissing=n_missing, grandMean=grand_mean,
cv=cv, treatmentMeans=treatment_means,
stdDevs=std_devs)
print(" *** Bartlett\'s Test ***\n")
print("Bartlett`s p-value =%10.3f" % p[0])
print("Bartlett`s test statistic =%10.3f" % bartletts[0])
print("\n *** Levene\'s Test ***\n")
print("Levene`s p-value =%10.3f" % p[1])
print("Levene`s test statistic =%10.3f" % levenes[0])
writeMatrix("Treatment means", treatment_means, column=True)
writeMatrix("Treatment std devs", std_devs, column=True)
print("\nGrand mean =%10.3f" % grand_mean[0])
print("cv =%10.3f" % cv[0])
print("n_missing = ", n_missing[0])
Output¶
*** Bartlett's Test ***
Bartlett`s p-value = 0.944
Bartlett`s test statistic = 2.257
*** Levene's Test ***
Levene`s p-value = 0.994
Levene`s test statistic = 0.135
Grand mean = 33.871
cv = 28.378
n_missing = 0
Treatment means
1 23.83
2 30.77
3 28.10
4 28.87
5 34.20
6 43.30
7 38.90
8 43.00
Treatment std devs
1 5.35
2 8.03
3 9.44
4 8.13
5 7.70
6 8.00
7 13.92
8 13.17