Package com.imsl.stat

Class NormTwoSample

java.lang.Object
com.imsl.stat.NormTwoSample
All Implemented Interfaces:
Serializable, Cloneable

public class NormTwoSample extends Object implements Serializable, Cloneable
Computes statistics for mean and variance inferences using samples from two normal populations.

Class NormTwoSample computes statistics for making inferences about the means and variances of two normal populations, using independent samples in x and y. Missing values, that is, values equal to NaN (not a number), are excluded from the computations. For inferences concerning parameters of a single normal population, see class NormOneSample.

Let \(\mu_1\) and \(\sigma _1^2\) be the mean and variance of the first population, and let \(\mu_2\) and \(\sigma _2^2\) be the corresponding quantities of the second population. The methods in this class support tests for the difference in means \(\mu_1-\mu_2\), for equality of variances, and for the common variance (assuming the variances are equal).

The sample means and variances are as follows:

$$\bar x_1 = \left( {\sum {x_{1i} /n_1 } } \right), \,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\, \bar x_2 = \left( {\sum {x_{2i} } } /n_2\right)$$

and

$$s_1^2 = \sum {\left( {x_{1i} - \bar x_1 } \right)}^2 /\left( {n_1 - 1} \right), \,\,\,\,\,\,\,\,\,\,\,s_2^2 = \sum {\left( {x_{2i} - {\bar x}_2} \right)}^2 /\left( {n_2 - 1} \right)$$

Inferences about the Means

The test that the difference in means equals a certain value, for example, \(\mu_0\), depends on whether or not the variances of the two populations can be considered equal. If the variances are equal and \(\mu_0=0\), the test is the two-sample t-test, which is equivalent to an analysis-of-variance test. The pooled variance for the difference-in-means test is as follows:

$$s^2 = \frac{{\left( {n_1 - 1} \right)s_1 + \left( {n_2 - 1} \right)s_2 }} {{n_1 + n_2 - 2}}$$

The t statistic is as follows:

$$t = \frac{{\bar x_1 - \bar x_2 - \mu _0}} {s\sqrt {{\left( {1/n_1 } \right)} + \left( {1/n_2 } \right)}}$$

Also, the confidence interval for the difference in means can be obtained by first assigning the unequal variances flag to false. This can be done by calling the setUnequalVariances method. The confidence interval can then be obtained by the getLowerCIDiff and getUpperCIDiff methods.

If the population variances are not equal, the ordinary t statistic does not have a t distribution and several approximate tests for the equality of means have been proposed. (See, for example, Anderson and Bancroft 1952, and Kendall and Stuart 1979.) One of the earliest tests devised for this situation is the Fisher-Behrens test, based on Fisher's concept of fiducial probability. A procedure used in the getTTest, getLowerCIDiff and getUpperCIDiff methods assuming unequal variances are specified is the Satterthwaite's procedure, as suggested by H.F. Smith and modified by F.E. Satterthwaite (Anderson and Bancroft 1952, p. 83). Use setUnequalVariances true to obtain results assuming unequal variances.

The test statistic is

$$t' = \left( {\bar x_1 - \bar x_2 - \mu _0 } \right)/s_d$$

where

$$s_d = \sqrt {\left( {s_1^2 /n_1 } \right) + \left( {s_2^2 /n_2 } \right)}$$

Under the null hypothesis of \(\mu_1- \mu_2= c\), this quantity has an approximate t distribution with degrees of freedom df, given by the following equation:

$${\rm{df}} = \frac{{s_d^4 }}{{\frac{{\left( {s_1^2 /n_1 } \right)^2 }}{{n_1 - 1}} + \frac{{\left( {s_2^2 /n_2 } \right)^2 }}{{n_2 - 1}}}}$$

Inferences about Variances

The F statistic for testing the equality of variances is given by \(F = s_{\max }^2 /s_{\min }^2\), where \(s_{\max}^2\) is the larger of \(s_1^2\) and \(s_2^2\). If the variances are equal, this quantity has an F distribution with \(n_1 - 1\) and \(n_2 - 1\) degrees of freedom.

Note: it is generally not recommended that the results of the F test be used to decide whether to use the regular t-test or the modified \(t'\) on a single set of data. The modified \(t'\) (Satterthwaite's procedure) is the more conservative approach to use if there is doubt about the equality of the variances.

See Also:
  • Constructor Summary

    Constructors
    Constructor
    Description
    NormTwoSample(double[] x, double[] y)
    Constructor.
  • Method Summary

    Modifier and Type
    Method
    Description
    void
    downdateX(double[] x)
    Removes the observations in x from the first sample.
    void
    downdateY(double[] y)
    Removes the observations in y from the second sample.
    double
    Returns the test statistic associated with the chi-squared test for the (assumed) common variance.
    int
    Returns the degrees of freedom associated with the chi-squared test for the common variance.
    double
    Returns the probability of a larger value than the chi-squared statistic associated with the test for the common variance, assuming the null hypothesis is true (i.e., the p-value for the test).
    double
    Returns the difference in sample means.
    double
    Returns the F statistic value calculated in an F-test for equality of variances.
    int
    Returns the denominator degrees of freedom of the F test for equality of variances.
    int
    Returns the numerator degrees of freedom in the \(F\)-test for equality of variances.
    double
    Returns the probability of a larger (in absolute value) F statistic value, assuming equal variances (i.e., the p-value for the test).
    double
    Returns the lower confidenceVariance \(*100%\) confidence limit for the common variance.
    double
    Returns the lower confidence limit for the difference, \(\mu_x - \mu_y\) for equal or unequal variances depending on the value set by setUnequalVariances.
    double
    Returns the approximate lower confidence limit in an interval estimate for the ratio of variances, \(\sigma_1^2/\sigma_2^2\).
    double
    Returns the mean of the first sample.
    double
    Returns the mean of the second sample.
    double
    Returns the pooled variance for the two samples.
    double
    Returns the standard deviation of the first sample.
    double
    Returns the standard deviation of the second sample.
    double
    Returns the test statistic for the Satterthwaite's approximation.
    double
    Returns the degrees of freedom for the Satterthwaite's approximation.
    double
    Returns the approximate probability of observing a larger value of the t-statistic given the null hypothesis is true (i.e.,the p-value for the test).
    double
    Returns the upper confidenceVariance \(*100%\) confidence limit for the common variance.
    double
    Returns the upper confidence limit for the difference, \(\mu_x - \mu_y\) for equal or unequal variances depending on the value set by setUnequalVariances.
    double
    Returns the approximate upper confidence limit in a confidence interval for the ratio of variances, \(\sigma_1^2/\sigma_2^2\).
    void
    setChiSquaredTestNull(double varianceHypothesisValue)
    Sets the null hypothesis value for the chi-squared test.
    void
    setConfidenceMean(double confidenceMean)
    Sets the confidence level (in percent) for a two-sided confidence interval for the difference in means, \(\mu_x - \mu_y\).
    void
    setConfidenceVariance(double confidenceVariance)
    Sets the confidence level for a two-sided interval estimate for the common variance and for the ratios of variances.
    void
    setTTestNull(double meanHypothesis)
    Sets the Null hypothesis value for t-test for the mean.
    void
    setUnequalVariances(boolean uneqVar)
    Specifies whether to return statistics based on equal or unequal variances.
    void
    update(double[] x, double[] y)
    Concatenates the data in x and y with the samples provided in the constructor.
    void
    updateX(double[] x)
    Concatenates the data in x with the first sample.
    void
    updateY(double[] y)
    Concatenates the data in y with the second sample.

    Methods inherited from class java.lang.Object

    clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
  • Constructor Details

    • NormTwoSample

      public NormTwoSample(double[] x, double[] y)
      Constructor.
      Parameters:
      x - a double array containing the first sample
      y - a double array containing the second sample
  • Method Details

    • getDiffMean

      public double getDiffMean()
      Returns the difference in sample means.
      Returns:
      a double, the difference in sample means
    • getMeanX

      public double getMeanX()
      Returns the mean of the first sample.
      Returns:
      a double, the mean of the first sample
    • getMeanY

      public double getMeanY()
      Returns the mean of the second sample.
      Returns:
      a double, the mean of the second sample
    • setConfidenceMean

      public void setConfidenceMean(double confidenceMean)
      Sets the confidence level (in percent) for a two-sided confidence interval for the difference in means, \(\mu_x - \mu_y\). The argument, confidenceMean must be between \(0.0\) and\( 1.0\) and common choices are \(0.90, 0.95\) or \(0.99\). Note: In order to use getUpperCIDiff() (getLowerCIDiff()) as a \(C\)% upper (lower) one-sided confidence limit, set confidenceMean=\((1-2(1-C))/100\) Default: confidenceMean = .95
      Parameters:
      confidenceMean - double, the desired confidence level of the mean
    • getUpperCIDiff

      public double getUpperCIDiff()
      Returns the upper confidence limit for the difference, \(\mu_x - \mu_y\) for equal or unequal variances depending on the value set by setUnequalVariances. setUnequalVariances
      Returns:
      a double containing the upper confidence limit for the difference in means of the two populations.
    • getLowerCIDiff

      public double getLowerCIDiff()
      Returns the lower confidence limit for the difference, \(\mu_x - \mu_y\) for equal or unequal variances depending on the value set by setUnequalVariances. setUnequalVariances
      Returns:
      a double containing the lower confidence limit for the difference in means of the two populations.
    • setUnequalVariances

      public void setUnequalVariances(boolean uneqVar)
      Specifies whether to return statistics based on equal or unequal variances. The default is to return statistics for equal variances. If uneqVar is True then statistics for unequal variances will be returned.
      Parameters:
      uneqVar - a boolean containing a true or false value. A value of true will cause results for unequal variances to be returned. A value of false will cause results for equal variances to be returned.
    • getTTestDF

      public double getTTestDF()
      Returns the degrees of freedom for the Satterthwaite's approximation. The value returned is based on the assumption of equal or unequal variances, depending on the value set by setUnequalVariances. setUnequalVariances
      Returns:
      an double containing the degrees of freedom for the t-test.
    • getTTest

      public double getTTest()
      Returns the test statistic for the Satterthwaite's approximation. The value returned is based on the assumption of equal or unequal variances, according to the value set by setUnequalVariances. setUnequalVariances
      Returns:
      a double containing the test statistic for the t-test.
    • getTTestP

      public double getTTestP()
      Returns the approximate probability of observing a larger value of the t-statistic given the null hypothesis is true (i.e.,the p-value for the test). The value returned is based on the assumption of equal or unequal variances, according to the value set by setUnequalVariances for equal or unequal variances. setUnequalVariances
      Returns:
      a double, the p-value for the test
    • setTTestNull

      public void setTTestNull(double meanHypothesis)
      Sets the Null hypothesis value for t-test for the mean. meanHypothesis=0.0 by default.
      Parameters:
      meanHypothesis - double containing the hypothesis value.
    • getPooledVariance

      public double getPooledVariance()
      Returns the pooled variance for the two samples.
      Returns:
      a double, the pooled variance for the two samples
    • setConfidenceVariance

      public void setConfidenceVariance(double confidenceVariance)
      Sets the confidence level for a two-sided interval estimate for the common variance and for the ratios of variances. Under the assumption of equal variances, the pooled variance for the two samples is used to obtain a confidenceVariance \(*100%\) two-sided confidence interval for the common variance with lower limit returned by getLowerCICommonVariance and upper limit returned by getUpperCICommonVariance. Without making the assumption of equal variances, setUnequalVariances , the ratio of the variances is of interest. A two-sided confidenceVariance\(*100%\) confidence interval for the ratios of the variances \(\sigma_1^2/\sigma_2^2\) is given by the getLowerCIRatioVariance and getUpperCIRatioVariance. See setUnequalVariances and getUpperCIRatioVariance. The confidence intervals are symmetric in probability. Argument confidenceVariance must be between 0.0 and 1.0 and is often 0.90, 0.95 or 0.99. The default is 0.95.
      Parameters:
      confidenceVariance - a double containing the confidence level of the variance
    • getLowerCICommonVariance

      public double getLowerCICommonVariance()
      Returns the lower confidenceVariance \(*100%\) confidence limit for the common variance.
      Returns:
      a double, the lower confidence limit for the common variance
    • getUpperCICommonVariance

      public double getUpperCICommonVariance()
      Returns the upper confidenceVariance \(*100%\) confidence limit for the common variance.
      Returns:
      a double the upper confidence limit for the common variance
    • getChiSquaredTestDF

      public int getChiSquaredTestDF()
      Returns the degrees of freedom associated with the chi-squared test for the common variance. The chi-squared test is a test of the hypothesis \(\omega^2 = \omega_0^2\) where \(\omega_0^2\) is the null hypothesis value as described in setChiSquaredTestNull.
      Returns:
      an int, the degrees of freedom for the chi-squared test
    • getChiSquaredTest

      public double getChiSquaredTest()
      Returns the test statistic associated with the chi-squared test for the (assumed) common variance. The chi-squared test is a test of the hypothesis \(\omega^2 = \omega_0^2\) where \(\omega^2\) is the assumed common variance between the two populations and \(\omega_0^2\) is the null hypothesis value as described in setChiSquaredTestNull.
      Returns:
      a double, the test statistic for the chi-squared test
    • getChiSquaredTestP

      public double getChiSquaredTestP()
      Returns the probability of a larger value than the chi-squared statistic associated with the test for the common variance, assuming the null hypothesis is true (i.e., the p-value for the test).
      Returns:
      a double, the p-value for the chi-squared test
    • setChiSquaredTestNull

      public void setChiSquaredTestNull(double varianceHypothesisValue)
      Sets the null hypothesis value for the chi-squared test. The default is 1.0.
      Parameters:
      varianceHypothesisValue - a double , the null hypothesis value for the chi-squared test
    • getStdDevX

      public double getStdDevX()
      Returns the standard deviation of the first sample.
      Returns:
      a double, the standard deviation of the first sample
    • getStdDevY

      public double getStdDevY()
      Returns the standard deviation of the second sample.
      Returns:
      a double, the standard deviation of the second sample
    • getLowerCIRatioVariance

      public double getLowerCIRatioVariance()
      Returns the approximate lower confidence limit in an interval estimate for the ratio of variances, \(\sigma_1^2/\sigma_2^2\).
      Returns:
      a double, the lower limit
    • getUpperCIRatioVariance

      public double getUpperCIRatioVariance()
      Returns the approximate upper confidence limit in a confidence interval for the ratio of variances, \(\sigma_1^2/\sigma_2^2\).
      Returns:
      a double, the upper limit
    • getFTestDFnumerator

      public int getFTestDFnumerator()
      Returns the numerator degrees of freedom in the \(F\)-test for equality of variances.
      Returns:
      an int, the numerator degrees of freedom
    • getFTestDFdenominator

      public int getFTestDFdenominator()
      Returns the denominator degrees of freedom of the F test for equality of variances.
      Returns:
      an int, the denominator degrees of freedom
    • getFTest

      public double getFTest()
      Returns the F statistic value calculated in an F-test for equality of variances.
      Returns:
      a double, the value of the test statistic
    • getFTestP

      public double getFTestP()
      Returns the probability of a larger (in absolute value) F statistic value, assuming equal variances (i.e., the p-value for the test).
      Returns:
      a double, the probability of a larger F statistic
    • updateX

      public void updateX(double[] x)
      Concatenates the data in x with the first sample.

      This method updates the test results to include a new subset of the data. This is useful when the data is too large to fit into memory or when all of the data is not available at one time or location.

      Parameters:
      x - a double array containing new data for the first sample
    • downdateX

      public void downdateX(double[] x)
      Removes the observations in x from the first sample.
      Parameters:
      x - a double array containing the values to remove from the first sample
    • updateY

      public void updateY(double[] y)
      Concatenates the data in y with the second sample.

      This method updates the test results to include a new subset of the data. This is useful when the data is too large to fit into memory or when all of the data is not available at one time or location.

      Parameters:
      y - a double array containing new data for the second sample
    • downdateY

      public void downdateY(double[] y)
      Removes the observations in y from the second sample.
      Parameters:
      y - a double array containing the values to remove from the second sample
    • update

      public void update(double[] x, double[] y)
      Concatenates the data in x and y with the samples provided in the constructor.

      This method updates the test results to include a new subset of the data. This is useful when the data is too large to fit into memory or when all of the data is not available at one time or location.

      Parameters:
      x - a double array containing updates to the first sample
      y - a double array containing updates to the second sample