public class ANOVA extends Object implements Serializable, Cloneable
Modifier and Type | Field and Description |
---|---|
static int |
BONFERRONI
The Bonferroni method
|
static int |
DUNN_SIDAK
The Dunn-Sidak method
|
static int |
ONE_AT_A_TIME
The One-at-a-Time (Fisher's LSD) method
|
static int |
SCHEFFE
The Scheffe method
|
static int |
TUKEY
The Tukey method
|
static int |
TUKEY_KRAMER
The Tukey-Kramer method
|
Constructor and Description |
---|
ANOVA(double[][] y)
Analyzes a one-way classification model.
|
ANOVA(double dfr,
double ssr,
double dfe,
double sse,
double gmean)
Construct an analysis of variance table and related statistics.
|
Modifier and Type | Method and Description |
---|---|
double |
getAdjustedRSquared()
Returns the adjusted R-squared (in percent).
|
double[] |
getArray()
Returns the ANOVA values as an array.
|
double |
getCoefficientOfVariation()
Returns the coefficient of variation (in percent).
|
double[] |
getConfidenceInterval(double conLevel,
int i,
int j,
int compMethod)
Computes the confidence interval associated with the difference of means
between two groups using a specified method.
|
double |
getDegreesOfFreedomForError()
Returns the degrees of freedom for error.
|
double |
getDegreesOfFreedomForModel()
Returns the degrees of freedom for model.
|
double |
getDunnSidak(int i,
int j)
Deprecated.
Use
ANOVA.getConfidenceInterval(double, int, int, int) instead. |
double |
getErrorMeanSquare()
Returns the error mean square.
|
double |
getF()
Returns the F statistic.
|
double[][] |
getGroupInformation()
Returns information concerning the groups.
|
double |
getMeanOfY()
Returns the mean of the response (dependent variable).
|
double |
getModelErrorStdev()
Returns the estimated standard deviation of the model error.
|
double |
getModelMeanSquare()
Returns the model mean square.
|
double |
getP()
Returns the p-value.
|
double |
getRSquared()
Returns the R-squared (in percent).
|
double |
getSumOfSquaresForError()
Returns the sum of squares for error.
|
double |
getSumOfSquaresForModel()
Returns the sum of squares for model.
|
double |
getTotalDegreesOfFreedom()
Returns the total degrees of freedom.
|
int |
getTotalMissing()
Returns the total number of missing values.
|
double |
getTotalSumOfSquares()
Returns the total sum of squares.
|
public static final int TUKEY
public static final int TUKEY_KRAMER
public static final int DUNN_SIDAK
public static final int BONFERRONI
public static final int SCHEFFE
public static final int ONE_AT_A_TIME
public ANOVA(double[][] y)
y
- is a two-dimension double
array containing the
responses. The rows in y
correspond to
observation groups. Each row of y
can
contain a different number of observations.public ANOVA(double dfr, double ssr, double dfe, double sse, double gmean)
dfr
- a double
scalar value representing
the degrees of freedom for model.ssr
- a double
scalar value representing the sum
of squares for model.dfe
- a double
scalar value representing the
degrees of freedom for error.sse
- a double
scalar value representing the sum
of squares for error.gmean
- a double
scalar value representing the
grand mean. If the grand mean is not known it may be
set to not-a-number.public double[] getArray()
double
[15] array containing the following values:
index | Value |
0 | Degrees of freedom for model |
1 | Degrees of freedom for error |
2 | Total degrees of freedom |
3 | Sum of squares for model |
4 | Sum of squares for error |
5 | Total sum of squares |
6 | Model mean square |
7 | Error mean square |
8 | F statistic |
9 | p-value |
10 | R-squared (in percent) |
11 | Adjusted R-squared (in percent) |
12 | Estimated standard deviation of the model error |
13 | Mean of the response (dependent variable) |
14 | Coefficient of variation (in percent) |
public double getDegreesOfFreedomForModel()
double
scalar value representing the degrees of
freedom for modelpublic double getDegreesOfFreedomForError()
double
scalar value representing the degrees of
freedom for errorpublic double getTotalDegreesOfFreedom()
double
scalar value representing the total
degrees of freedompublic double getSumOfSquaresForModel()
double
scalar value representing the sum of
squares for modelpublic double getSumOfSquaresForError()
double
scalar value representing the sum of
squares for errorpublic double getTotalSumOfSquares()
double
scalar value representing the total sum of
squarespublic double getModelMeanSquare()
double
scalar value representing the model mean
squarepublic double getErrorMeanSquare()
double
scalar value representing the error mean
squarepublic double getF()
double
scalar value representing the F statisticpublic double getP()
double
scalar value representing the
p-valuepublic double getRSquared()
double
scalar value representing the
R-squared (in percent)public double getAdjustedRSquared()
double
scalar value representing the adjusted
R-squared (in percent)public double getModelErrorStdev()
double
scalar value representing the estimated
standard deviation of the model errorpublic double getMeanOfY()
double
scalar value representing the mean of the
response (dependent variable)public int getTotalMissing()
int
scalar value representing the total number
of missing values (NaN) in input Y. Elements of Y containing
NaN (not a number) are omitted from the computations.public double getCoefficientOfVariation()
double
scalar value representing the coefficient
of variation (in percent)public double[][] getGroupInformation()
double
array containing information
concerning the groups. Row i contains information
pertaining to the i-th group. The information in the
columns is as follows:
Column | Information |
0 | Group Number |
1 | Number of nonmissing observations |
2 | Group Mean |
3 | Group Standard Deviation |
public double getDunnSidak(int i, int j)
ANOVA.getConfidenceInterval(double, int, int, int)
instead.i
- is an int
indicating the i-th member of
the pair, \(\mu_i\)j
- is an int
indicating the j-th member of
the pair, \(\mu_j\)IllegalArgumentException
- is thrown when i
or
j
is greater than or equal to the number of
observations in the group represented by rows i
or j
of y
respectively.public double[] getConfidenceInterval(double conLevel, int i, int j, int compMethod)
getConfidenceInterval
computes the simultaneous
confidence interval on the pairwise comparison of means \({\mu}_i
\) and \({\mu}_j\) in the one-way analysis of
variance model. Any of several methods can be chosen. A good review of
these methods is given by Stoline (1981). Also the methods are discussed
in many elementary statistics texts, e.g., Kirk (1982, pages 114-127).
Let \(s^2\) be the estimated variance of a single
observation. Let \(\nu\) be the degrees of freedom
associated with \(s^2\). Let $$
\alpha=1-\frac{conLevel}{100.0}$$ The methods are summarized as
follows:
Tukey method: The Tukey method gives the narrowest simultaneous confidence intervals for the pairwise differences of means \( {\mu}_i-{\mu}_j\) in balanced \(\left({n_1=n_2=\ldots= n_k=n}\right)\) one-way designs. The method is exact and uses the Studentized range distribution. The formula for the difference \({\mu}_i - {\mu}_j\) is given by
$$\bar y_i-\bar y_j\pm q_{1-\alpha;k,v} \sqrt{\frac{s^2}{n}}$$where \(q_{1-a,k,v}\) is the \((1-\alpha)100 \) percentage point of the Studentized range distribution with parameters \(k\) and \(\nu\). If the group sizes are unequal, the Tukey-Kramer method is used instead.
Tukey-Kramer method: The Tukey-Kramer method is an approximate extension of the Tukey method for the unbalanced case. (The method simplifies to the Tukey method for the balanced case.) The method always produces confidence intervals narrower than the Dunn-Sidak and Bonferroni methods. Hayter (1984) proved that the method is conservative, i.e., the method guarantees a confidence coverage of at least \(\left({1- \alpha}\right)100\%\). Hayter's proof gave further support to earlier recommendations for its use (Stoline 1981). (Methods that are currently better are restricted to special cases and only offer improvement in severely unbalanced cases, see, e.g., Spurrier and Isham 1985). The formula for the difference \({\mu}_i-{\mu}_j \) is given by the following:
$$\bar{y}_i-\bar{y}_j\pm{q_{1-\alpha;v,k}\sqrt{ \frac{s^2}{2n_i}+\frac{s^2}{2n_j}}}$$Dunn-Sidak method: The Dunn-Sidak method is a conservative method. The method gives wider intervals than the Tukey-Kramer method. (For large \(\nu\) and small \(\alpha\) and k, the difference is only slight.) The method is slightly better than the Bonferroni method and is based on an improved Bonferroni (multiplicative) inequality (Miller, pages 101, 254-255). The method uses the t distribution. The formula for the difference \( {\mu}_i-{\mu}_j\) is given by
$$\bar{y}_i-\bar{y}_j\pm{t_{\frac{1}{2}+ \frac{1}{2}\left({1-\alpha}\right)^{1/k^*};v}\sqrt{\frac{s^2}{n_i}+ \frac{s^2 }{n_j}}}$$where \(t_{f;\nu}\) is the 100f percentage point of the t distribution with \(\nu\) degrees of freedom.
Bonferroni method: The Bonferroni method is a conservative method based on the Bonferroni (additive) inequality (Miller, page 8). The method uses the t distribution. The formula for the difference \({\mu}_i-{\mu}_j\) is given by
$$\bar{y}_i-\bar{y}_j\pm{t_{1-\frac{\alpha}{2k^*} ;v}\sqrt{\frac{s^2}{n_i}+\frac{s^2}{n_j}}}$$Scheffé method: The Scheffé method is an overly conservative method for simultaneous confidence intervals on pairwise difference of means. The method is applicable for simultaneous confidence intervals on all contrasts, i.e., all linear combinations
$$\sum\limits_{i=1}^k{c_i\mu_i}$$where the following is true:
$$\sum\limits_{i = 1}^k{c_i=0}$$The method can be recommended here only if a large number of confidence intervals on contrasts in addition to the pairwise differences of means are to be constructed. The method uses the F distribution. The formula for the difference \({\mu}_i-{\mu}_j \) is given by
$$\bar{y}_i-\bar{y}_j\pm{\sqrt{\left({k-1}\right) F_{1-\alpha;k-1,v}\left(\frac{s^2}{n_i}+\frac{s^2}{n_j}\right)}} $$where \(F_{1-a;\left({k-1}\right),\nu}\) is the \(\left({1-\alpha}\right)100\) percentage point of the F distribution with \(k-1\) and \(\nu \) degrees of freedom.
One-at-a-time t method (Fisher's LSD): The one-at-a-time t method is the method appropriate for constructing a single confidence interval. The confidence percentage input is appropriate for one interval at a time. The method has been used widely in conjunction with the overall test of the null hypothesis \({\mu}_1={\mu}_2= \ldots={\mu}_k\) by the use of the F statistic. Fisher's LSD (least significant difference) test is a two-stage test that proceeds to make pairwise comparisons of means only if the overall F test is significant. Milliken and Johnson (1984, page 31) recommend LSD comparisons after a significant F only if the number of comparisons is small and the comparisons were planned prior to the analysis. If many unplanned comparisons are made, they recommend Scheffe's method. If the F test is insignificant, a few planned comparisons for differences in means can still be performed by using either Tukey, Tukey-Kramer, Dunn-Sidak or Bonferroni methods. Because the F test is insignificant, Scheffe's method will not yield any significant differences. The formula for the difference \( {\mu}_i-{\mu}_j\) is given by
$$\bar{y}_i-\bar{y}_j\pm{t_{1-\frac{\alpha}{2};v} \sqrt{\frac{s^2}{n_i}+\frac{s^2}{n_j}}}$$conLevel
- a double
specifying the confidence
level for simultaneous interval estimation. If the
Tukey method for computing the confidence intervals
on the pairwise difference of means is to be used,
conLevel
must be in the range [90.0,
99.0]. Otherwise, conLevel
must be in
the range i
- is an int
indicating the i-th member of
the pair difference, \(\mu_i-\mu_j\).
i
must be a valid group index.j
- is an int
indicating the j-th member of
the pair difference, \(\mu_i-\mu_j\).
j
must be a valid group index.compMethod
- must be one of the following:
compMethod | Description |
TUKEY | Uses the Tukey method. This method is valid for balanced one-way designs. |
TUKEY_KRAMER | Uses the Tukey-Kramer method. This method simplifies to the Tukey method for the balanced case. |
DUNN_SIDAK | Uses the Dunn-Sidak method. |
BONFERRONI | Uses the Bonferroni method. |
SCHEFFE | Uses the Scheffe method. |
ONE_AT_A_TIME | Uses the One-at-a-Time (Fisher's LSD) method. |
double
array containing the group numbers,
difference of means, and lower and upper confidence limits.
Array Element | Description |
0 | Group number for the i-th mean. |
1 | Group number for the j-th mean. |
2 | Difference of means (i-th mean) - (j-th mean). |
3 | Lower confidence limit for the difference. |
4 | Upper confidence limit for the difference. |
Copyright © 2020 Rogue Wave Software. All rights reserved.