Package com.imsl.stat

Class WelchsTTest

java.lang.Object
com.imsl.stat.WelchsTTest
All Implemented Interfaces:
Serializable, Cloneable

public class WelchsTTest extends Object implements Serializable, Cloneable
Performs Welch's t-test for testing the difference in means between two normal populations with unequal variances.

Let \(\mu_x\) and \(\sigma _x^2\) be the mean and variance of the first population, and let \(\mu_y\) and \(\sigma_y^2\) be the corresponding quantities of the second population. The methods in this class support tests and confidence intervals for the difference in means \(\mu_x-\mu_y\).

For some real constant, \(c\), a hypothesis test for the difference may be expressed as one of the following: $$ H_0: \mu_x - \mu_y = c \,\,\,\,\,\mbox{vs.}\,\,\,\,\, H_1: \mu_x - \mu_y \ne c$$ $$ H_0: \mu_x - \mu_y \le c \,\,\,\,\,\mbox{vs.}\,\,\,\,\, H_1: \mu_x - \mu_y >c$$ $$ H_0:\mu_x - \mu_y \ge c \,\,\,\,\,\mbox{vs.}\,\,\,\,\, H_1: \mu_x - \mu_y < c$$ where \(H_0\) is the null-hypothesis, and \(H_1\) is the alternate or alternative hypothesis. The first test is a two-sided test, because the rejection region defined by \(H_1\) is two-sided, while the other tests are one-sided. Conventionally, the null hypothesis is assumed to be true and the alternate hypothesis represents an experimental conjecture. If there is sufficient evidence in the sample data the null hypothesis \(H_0\) is rejected in favor of \(H_1\), while insufficient evidence in the sample data results in a failure to reject the null hypothesis. Evidence for the decision to reject or fail to reject is based on probabilities calculated using the test statistic's distribution under the null hypothesis (the null distribution).

For the Welch's t-test, the two samples are assumed to come from two independent normal distributions with unequal variances (\(\sigma_x^2 \ne \sigma_y^2)\) and means that satisfy the null hypothesis. When the population variances are not equal, the ordinary t statistic does not have a t distribution and several approximate tests for the equality of means have been proposed. (See, for example, Anderson and Bancroft 1952, and Kendall and Stuart 1979.)

Welch's test statistic is given by

$$t' = \left( {\bar x - \bar y - c } \right)/s_d$$

where \(\bar{x}, \bar{y}, s^2_x, s^2_y\) are the sample means and sample variances (unbiased versions), respectively, and

$$s_d = \sqrt {\left( {s_x^2 /n_x } \right) + \left( {s_y^2 /n_y } \right)}$$

Under the null hypothesis of \(\mu_x- \mu_y= c\), this quantity has an approximate t-distribution with degrees of freedom df, given by the following equation (known as the Welch-Satterthwaite approximation):

$${\rm{df}} = \frac{{s_d^4 }}{{\frac{{\left( {s_x^2 /n_x } \right)^2 }}{{n_x - 1}} + \frac{{\left( {s_y^2 /n_y } \right)^2 }}{{n_y - 1}}}}$$

Probabilities based on this distribution form the basis of the test and the confidence intervals for the mean difference. For two-sample tests when the variances are assumed equal and for tests of the common variance or for the ratio of variances, see the class NormTwoSample.
See Also:
  • Nested Class Summary

    Nested Classes
    Modifier and Type
    Class
    Description
    static enum 
    The form of the alternate hypothesis.
  • Constructor Summary

    Constructors
    Constructor
    Description
    WelchsTTest(double[] x, double[] y)
    Constructor for the class.
  • Method Summary

    Modifier and Type
    Method
    Description
    void
    downdateX(double[] x)
    Removes the observations in x from the first sample.
    void
    downdateY(double[] y)
    Removes the observations in y from the second sample.
    double
    Returns the difference in sample means.
    double
    Returns the (approximate) lower confidenceMean*100% confidence limit for the difference in population means, \(\mu_x - \mu_y\).
    double
    Returns the mean of the first sample.
    double
    Returns the mean of the second sample.
    double
    Returns the standard deviation of the first sample.
    double
    Returns the standard deviation of the second sample.
    double
    Returns the calculated test statistic for Welch's t-test.
    double
    Returns the degrees of freedom used in the test.
    double
    Returns the approximate probability of observing a more extreme value of the t-statistic given the null hypothesis is true (i.e, the approximate p-value of the test).
    double
    Returns the (approximate) upper confidenceMean*100% confidence limit for the difference in population means, \(\mu_x - \mu_y\).
    void
    setConfidenceMean(double confidenceMean)
    Sets the confidence level for a two-sided confidence interval for the difference in population means, \(\mu_x - \mu_y\).
    void
    Sets the direction of the null/alternative test.
    void
    setTTestNull(double meanHypothesis)
    Sets the null hypothesis value.
    void
    update(double[] x, double[] y)
    Concatenates the data in x and y with the samples provided in the constructor.
    void
    updateX(double[] x)
    Concatenates the data in x with the first sample.
    void
    updateY(double[] y)
    Concatenates the data in y with the second sample.

    Methods inherited from class java.lang.Object

    clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
  • Constructor Details

    • WelchsTTest

      public WelchsTTest(double[] x, double[] y)
      Constructor for the class.
      Parameters:
      x - a double array containing data for the first sample
      y - a double array containing data for the second sample
  • Method Details

    • getDiffMean

      public double getDiffMean()
      Returns the difference in sample means.
      Returns:
      a double, the difference in sample means
    • setHypothesis

      public void setHypothesis(WelchsTTest.Hypothesis hypothesis)
      Sets the direction of the null/alternative test.

      Default: hypothesis = Hypothesis.TWO_SIDED

      Parameters:
      hypothesis - an Hypothesis enum, specifying the type of test
    • setTTestNull

      public void setTTestNull(double meanHypothesis)
      Sets the null hypothesis value. The null hypothesis value is the value \(c\) in \(H_0:\mu_x - \mu_y = c \) and the other forms of the hypothesis.

      Default: \( c = 0.0 \)

      Parameters:
      meanHypothesis - double, the hypothesis value
    • setConfidenceMean

      public void setConfidenceMean(double confidenceMean)

      Sets the confidence level for a two-sided confidence interval for the difference in population means, \(\mu_x - \mu_y\).

      The argument, confidenceMean must be between \(0.0\) and \(1.0\). Common choices are \(0.90, 0.95\) or \(0.99\).

      Note: In order to use getUpperCIDiff() (getLowerCIDiff()) as a \(C\)% upper (lower) one-sided confidence limit, set confidenceMean=\((1-2(1-C))/100\).

      Default: confidenceMean = 0.95

      Parameters:
      confidenceMean - double, the desired confidence level
    • getUpperCIDiff

      public double getUpperCIDiff()
      Returns the (approximate) upper confidenceMean*100% confidence limit for the difference in population means, \(\mu_x - \mu_y\).
      Returns:
      a double, the upper confidence limit for the difference in means
    • getLowerCIDiff

      public double getLowerCIDiff()
      Returns the (approximate) lower confidenceMean*100% confidence limit for the difference in population means, \(\mu_x - \mu_y\).
      Returns:
      a double, the lower confidence limit for the difference in means
    • getTTestP

      public double getTTestP()
      Returns the approximate probability of observing a more extreme value of the t-statistic given the null hypothesis is true (i.e, the approximate p-value of the test).
      Returns:
      a double, the approximate p-value for the test
    • getTTestDF

      public double getTTestDF()
      Returns the degrees of freedom used in the test. (The value obtained using the Welch-Satterthwaite's approximation.)
      Returns:
      a double, the degrees of freedom used in the test
    • getTTest

      public double getTTest()
      Returns the calculated test statistic for Welch's t-test.
      Returns:
      a double, the test statistic
    • getMeanX

      public double getMeanX()
      Returns the mean of the first sample.
      Returns:
      a double, the mean of the first sample
    • getMeanY

      public double getMeanY()
      Returns the mean of the second sample.
      Returns:
      a double, the mean of the second sample
    • getStdDevX

      public double getStdDevX()
      Returns the standard deviation of the first sample.
      Returns:
      a double, the standard deviation of the first sample
    • getStdDevY

      public double getStdDevY()
      Returns the standard deviation of the second sample.
      Returns:
      a double, the standard deviation of the second sample
    • downdateY

      public void downdateY(double[] y)
      Removes the observations in y from the second sample.
      Parameters:
      y - a double array containing the values to remove from the second sample
    • downdateX

      public void downdateX(double[] x)
      Removes the observations in x from the first sample.
      Parameters:
      x - a double array containing the values to remove from the first sample
    • updateY

      public void updateY(double[] y)
      Concatenates the data in y with the second sample.

      This method updates the test results to include a new subset of the data. This is useful when the data is too large to fit into memory or when all of the data is not available at one time or location.

      Parameters:
      y - a double array containing new data for the second sample
    • updateX

      public void updateX(double[] x)
      Concatenates the data in x with the first sample.

      This method updates the test results to include a new subset of the data. This is useful when the data is too large to fit into memory or when all of the data is not available at one time or location.

      Parameters:
      x - a double array containing new data for the first sample
    • update

      public void update(double[] x, double[] y)
      Concatenates the data in x and y with the samples provided in the constructor.

      This method updates the test results to include a new subset of the data. This is useful when the data is too large to fit into memory or when all of the data is not available at one time or location.

      Parameters:
      x - a double array containing updates to the first sample
      y - a double array containing updates to the second sample