JMSLTM Numerical Library 5.0.1

com.imsl.stat
Class DiscriminantAnalysis

java.lang.Object
  extended by com.imsl.stat.DiscriminantAnalysis
All Implemented Interfaces:
Serializable, Cloneable

public class DiscriminantAnalysis
extends Object
implements Serializable, Cloneable

Performs a linear or a quadratic discriminant function analysis among several known groups and the use of either reclassification, split sample, or the leaving-out-one methods in order to evaluate the rule.

Class DiscriminantAnalysis performs discriminant function analysis using either linear or quadratic discrimination. The output from DiscriminantAnalysis includes a measure of distance between the groups, a table summarizing the classification results, a matrix containing the posterior probabilities of group membership for each observation, and the within-sample means and covariance matrices. The linear discriminant function coefficients are also computed.

All observations are input during one call to DiscriminantAnalysis, a method of operation that has the advantage of simplicity.

The first step in the algorithm is the initialization step. The variables means, classication table, and covariances are initialized to zero, and other program parameters are set. The next step begins by adding all observations in x to the means and the factorizations of the covariance matrices. It continues by computing some statistics of interest if requested: the linear discriminant functions, the prior probabilities, the log of the determinant of each of the covariance matrices, a test statistic for testing that all of the within-group covariance matrices are equal, and a matrix of Mahalanobis distances between the groups. The matrix of Mahalanobis distances is computed via the pooled covariance matrix when linear discrimination is specified, the row covariance matrix is used when the discrimination is quadratic. Covariance matrices are defined as follows. Let N_i denote the sum of the frequencies of the observations in group i, and let M_i denote the number of observations in group i. Then, if S_i denotes the within-group i covariance matrix,

S_i = frac{1}{N_i - 1} sum_{j=1}^{M_i} w_j f_j (x_j - overline{x})(x_j - overline{x})^T

where w_j is the weight of the j-th observation in group i, f_j is its frequency, x_j is the j-th observation column vector (in group i), and overline{x} denotes the mean vector of the observations in group i. The mean vectors are computed as

overline{x} = frac{1}{W_i} sum_{j=1}^{M_i} w_j f_j x_j

where

W_i = sum_{j=1}^{M_i} w_j f_j

Given the means and the covariance matrices, the linear discriminant function for group i is computed as:

z_i = ln(p_i)-0.5overline{x_i}^T S_{p}^{-1} overline{x_i} + x^T S_{p}^{-1} overline{x_i}

where ln(pi) is the natural log of the prior probability for the i-th group, x is the observation to be classified, and S_p denotes the pooled covariance matrix.

Let S denote either the pooled covariance matrix or one of the within-group covariance matrices S_i. (S will be the pooled covariance matrix in linear discrimination, and S_i otherwise.) The Mahalanobis distance between group i and group j is computed as:

D_{ij}^{2} = (overline{x_i} - overline{x_j})^T S^{-1} (overline{x_i} - overline{x_j})

Finally, the asymptotic chi-squared test for the equality of covariance matrices is computed as follows (Morrison 1976, page 252):

gamma = C^{-1} sum_{i=1}^{k} n_i { ln( left| S_p right| ) - ln( left| S_i right| ) }

where n_i is the number of degrees of freedom in the i-th sample covariance matrix, k is the number of groups, and

C^{-1} = 1-frac{2p^2 + 3p - 1}{6(p + 1)(k - 1)} left(sum_{i=1}^{k} frac{1}{n_i} - frac{1}{Sigma_{j}n_j} right)

where p is the number of variables.

The estimated posterior probability of each observation x belonging to group i is computed using the prior probabilities and the sample mean vectors and estimated covariance matrices under a multivariate normal assumption. Under quadratic discrimination, the within-group covariance matrices are used to compute the estimated posterior probabilities. The estimated posterior probability of an observation x belonging to group i is

hat{q_i}(x) = frac{e^{-frac{1}{2}D_{i}^{2}(x)}}{sum_{j=1}^{k} e^{-frac{1}{2}D_{j}^{2}(x)}}

where

D_{i}^{2}(x) = left{ begin{array}{ll}
                                                  (x - overline{x_i})^T S_{i}^{-1} (x - overline{x_i}) + ln left|S_i right| - 2 ln(p_i) & LINEAR ; or ; QUADRATIC \
                                                  (x - overline{x_i})^T S_{p}^{-1} (x - overline{x_i}) - 2 ln(p_i) & LINEAR, ; POOLED end{array} right.

For the leaving-out-one method of classification, the sample mean vector and sample covariance matrices in the formula for

D_{i}^{2}(x)

are adjusted so as to remove the observation x from their computation. For linear discrimination, the linear discriminant function coefficients are actually used to compute the same posterior probabilities.

Using the posterior probabilities, each observation in X is classified into a group; the result is tabulated in the matrix returned by getClassTable and saved in the vector returned by getClassMembership. The clasification table is not altered at this stage if X[i][groupIndex] contains a group number that is out of range. If the reclassification method is specified, then all observations with no missing values in the nVariables classification variables are classified. When the leaving-out-one method is used, observations with invalid group numbers, weights, frequencies or classification variables are not classified. Regardless of the frequency, a 1 is added (or subtracted) from the classification table for each row of X that is classified and contains a valid group number. When the leaving-out-one method is used, adjustment is made to the posterior probabilities to remove the effect of the observation in the classification rule. In this adjustment, each observation is presumed to have a weight of weights[i], and a frequency of 1.0. See Lachenbruch (1975, page 36) for the required adjustment.

Finally, upon completion, the covariance matrices are computed from their LU factorizations.

See Also:
Discriminant Analysis Example, Serialized Form

Nested Class Summary
static class DiscriminantAnalysis.CovarianceSingularException
          The variance-Covariance matrix is singular.
static class DiscriminantAnalysis.EmptyGroupException
          There are no observations in a group.
static class DiscriminantAnalysis.SumOfWeightsNegException
          The sum of the weights have become negative.
 
Field Summary
static int LEAVE_OUT_ONE
          Indicates leave-out-one as the Classicfication Method.
static int LINEAR
          Indicates a linear discrimination method.
static int POOLED
          Indicates Pooled covariances computed.
static int POOLED_GROUP
          Indicates Pooled, group covariances computed.
static int PRIOR_EQUAL
          Indicates prior probability type is to be prior equal.
static int PRIOR_PROPORTIONAL
          Indicates prior probability type is to be prior proportional.
static int QUADRATIC
          Indicates a quadratic discrimination method.
static int RECLASSIFICATION
          Indicates reclassification as the classicfication method.
 
Constructor Summary
DiscriminantAnalysis(int nVariables, int nGroups)
          Constructor for DiscriminantAnalysis.
 
Method Summary
 int[] getClassMembership()
          Returns the group number to which the observation was classified.
 double[][] getClassTable()
          Returns the classification table.
 double[][] getCoefficients()
          Returns the linear discriminant function coefficients.
 double[][][] getCovariance()
          Returns the array of covariances.
 int[] getGroupCounts()
          Returns the group counts.
 double[][] getMahalanobis()
          Returns the Mahalanobis distances between the group means.
 double[][] getMeans()
          Returns the variable means.
 int getNRowsMissing()
          Returns the number of rows of data encountered containing missing values (NaN).
 double[] getPrior()
          Returns the prior probabilities.
 double[][] getProbability()
          Returns the posterior probabilities for each observation.
 double[] getStatistics()
          Returns statistics.
 void setClassificationMethod(int method)
          Sets the classification method.
 void setCovarianceComputation(int type)
          Sets the type of covariance matrices to be computed.
 void setDiscriminationMethod(int method)
          Sets the discrimination method.
 void setPrior(double[] prior)
          Sets the prior probabilities.
 void setPrior(int type)
          Sets the type of prior probabilities to be computed.
 void update(double[][] x)
          Processes a set of observations and performs a linear or quadratic discriminant function analysis among the several known groups.
 void update(double[][] x, double[] frequencies, double[] weights)
          Processes a set of observations and associated frequencies and weights then performs a linear or quadratic discriminant function analysis among the several known groups.
 void update(double[][] x, int groupIndex)
          Processes a set of observations and performs a linear or quadratic discriminant function analysis among the several known groups.
 void update(double[][] x, int[] varIndex)
          Processes a set of observations and performs a linear or quadratic discriminant function analysis among the several known groups.
 void update(double[][] x, int[] varIndex, double[] frequencies, double[] weights)
          Processes a set of observations and associated frequencies and weights then performs a linear or quadratic discriminant function analysis among the several known groups.
 void update(double[][] x, int groupIndex, double[] frequencies, double[] weights)
          Processes a set of observations and associated frequencies and weights then performs a linear or quadratic discriminant function analysis among the several known groups.
 void update(double[][] x, int groupIndex, int[] varIndex)
          Processes a set of observations and performs a linear or quadratic discriminant function analysis among the several known groups.
 void update(double[][] x, int groupIndex, int[] varIndex, double[] frequencies, double[] weights)
          Processes a set of observations and associated frequencies and weights then performs a linear or quadratic discriminant function analysis among the several known groups.
 
Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
 

Field Detail

LEAVE_OUT_ONE

public static final int LEAVE_OUT_ONE
Indicates leave-out-one as the Classicfication Method.

See Also:
Constant Field Values

LINEAR

public static final int LINEAR
Indicates a linear discrimination method.

See Also:
Constant Field Values

POOLED

public static final int POOLED
Indicates Pooled covariances computed.

See Also:
Constant Field Values

POOLED_GROUP

public static final int POOLED_GROUP
Indicates Pooled, group covariances computed.

See Also:
Constant Field Values

PRIOR_EQUAL

public static final int PRIOR_EQUAL
Indicates prior probability type is to be prior equal.

See Also:
Constant Field Values

PRIOR_PROPORTIONAL

public static final int PRIOR_PROPORTIONAL
Indicates prior probability type is to be prior proportional.

See Also:
Constant Field Values

QUADRATIC

public static final int QUADRATIC
Indicates a quadratic discrimination method.

See Also:
Constant Field Values

RECLASSIFICATION

public static final int RECLASSIFICATION
Indicates reclassification as the classicfication method.

See Also:
Constant Field Values
Constructor Detail

DiscriminantAnalysis

public DiscriminantAnalysis(int nVariables,
                            int nGroups)
Constructor for DiscriminantAnalysis.

Parameters:
nVariables - An int representing the number of variables to be used in the discrimination.
nGroups - An int representing the number of groups in the data.
Method Detail

getClassMembership

public int[] getClassMembership()
Returns the group number to which the observation was classified.

Returns:
An int array containing the group to which the observation was classified. If an observation has an invalid group number, frequency, or weight when the leaving-out-one method has been specified, then the observation is not classified and the corresponding elements of the array are set to zero.

getClassTable

public double[][] getClassTable()
Returns the classification table.

Returns:
A nGroups times nGroups   double array containing the classification table. Each observation that is classified and has a group number equal to 1.0, 2.0, ..., nGroups is entered into the table. The rows of the table correspond to the known group membership. The columns refer to the group to which the observation was classified.

getCoefficients

public double[][] getCoefficients()
Returns the linear discriminant function coefficients.

Returns:
A double array containing the linear discriminant function coefficients. The first column of the array contains the constant term, and the remaining columns contain the variable coefficients. The i-th row of the returned array corresponds to group i. The coefficients are always computed as linear discriminant function coefficients even when quadratic discrimination is specified.

getCovariance

public double[][][] getCovariance()
Returns the array of covariances.

Returns:
A nVariables times nVariables times g   double array containing the covariances. Here, g = nGroups+1 unless pooled only covariance matrices are computed, in which case g=1. When pooled only covariance matrices are computed, the within-group covariance matrices are not computed. The pooled covariance matrix is always computed and is returned as the g-th covariance matrix.

getGroupCounts

public int[] getGroupCounts()
Returns the group counts.

Returns:
An int array of length nGroups containing the number of observations in each group.

getMahalanobis

public double[][] getMahalanobis()
Returns the Mahalanobis distances between the group means.

Returns:
A nGroups times nGroups   double array containing the Mahalanobis distances between the group means. For linear discrimination, the Mahalanobis distance

D_{ij}^2

between group means i and j is computed using the within covariance matrix for group i in place of the pooled covariance matrix.

getMeans

public double[][] getMeans()
Returns the variable means.

Returns:
A double array containing the variable means. The i-th row of the returned array contains the group i variable means.

getNRowsMissing

public int getNRowsMissing()
Returns the number of rows of data encountered containing missing values (NaN).

Returns:
A int representing the number of rows of data encountered containing missing values (NaN) for the classification, group, weight, and/or frequency variables. If a row of data contains a missing value (NaN) for any of these variables, that row is excluded from the computations.

getPrior

public double[] getPrior()
Returns the prior probabilities.

Returns:
A double vector of length nGroups containing the prior probabilities for each group.

getProbability

public double[][] getProbability()
Returns the posterior probabilities for each observation.

Returns:
A x.length times nGroups   double array containing the posterior probabilities for each observation.

getStatistics

public double[] getStatistics()
Returns statistics.

Returns:
A double array (stat) containing output statistics.
ISTAT[I]
0 Sum of the degrees of freedom for the within-covariance matrices.
1 Chi-squared statistic.
2 The degrees of freedom in the chi-squared statistic.
3 Probability of a greater chi-squared, respectively, of a test of the homogeneity of the within-covariance matrices. (Not computed when the pooled only covariance matrix is computed).
4 thru 4+nGroups Log of the determinant of each group's covariance matrix. (Not computed when the pooled only covariance matrix is computed) and of the pooled covariance matrix.
Last nGroups + 1 elements Sum of the weights within each group.
Last element Sum of the weights in all groups.

setClassificationMethod

public void setClassificationMethod(int method)
Sets the classification method.

Parameters:
method - A int scalar indicating the method of classification. Use class member RECLASSIFICATION or LEAVE_OUT_ONE. If this member function is not called, the RECLASSIFICATION method is used.

setCovarianceComputation

public void setCovarianceComputation(int type)
Sets the type of covariance matrices to be computed.

Parameters:
type - An int scalar indicating the type of covariance matrices to be computed. Use class member POOLED or POOLED_GROUP. If this member function is not called, the POOLED_GROUP type is used.

setDiscriminationMethod

public void setDiscriminationMethod(int method)
Sets the discrimination method.

Parameters:
method - An int scalar indicating the method of discrimination. Use class member LINEAR or QUADRATIC. If this member function is not called, the LINEAR method is used.

setPrior

public void setPrior(double[] prior)
Sets the prior probabilities.

Parameters:
prior - A double vector of length nGroups containing the prior probabilities for each group. The elements of prior should sum to 1.0. If this member function is not called, the elements of prior are set so as to be equal if PRIOR_EQUAL is set or they are set to be proportional to the sample size in each group if PRIOR_PROPORTIONAL is set.

setPrior

public void setPrior(int type)
Sets the type of prior probabilities to be computed.

Parameters:
type - An int scalar indicating the type of prior probabilities to be computed. Use class member PRIOR_EQUAL or PRIOR_PROPORTIONAL. If this member function is not called, the PRIOR_EQUAL type is used.

update

public void update(double[][] x)
            throws DiscriminantAnalysis.SumOfWeightsNegException,
                   DiscriminantAnalysis.EmptyGroupException,
                   DiscriminantAnalysis.CovarianceSingularException
Processes a set of observations and performs a linear or quadratic discriminant function analysis among the several known groups.

Parameters:
x - a double matrix containing the observations. The first nVariables columns correspond to the variables, and the last column (column nVariables) contains the group numbers. The groups must be numbered 1,2, ..., nGroups.
Throws:
DiscriminantAnalysis.SumOfWeightsNegException - is thrown when the sum of the weights have become negative.
DiscriminantAnalysis.EmptyGroupException - is thrown when there are no observations in a group.
DiscriminantAnalysis.CovarianceSingularException - is thrown when the variance-Covariance matrix is singular.

update

public void update(double[][] x,
                   double[] frequencies,
                   double[] weights)
            throws DiscriminantAnalysis.SumOfWeightsNegException,
                   DiscriminantAnalysis.EmptyGroupException,
                   DiscriminantAnalysis.CovarianceSingularException
Processes a set of observations and associated frequencies and weights then performs a linear or quadratic discriminant function analysis among the several known groups.

Parameters:
x - A double matrix containing the observations. The first nVariables columns correspond to the variables, and the last column (column nVariables) contains the group numbers. The groups must be numbered 1,2, ..., nGroups.
frequencies - A double array containing the associated frequencies.
weights - A double array containing the associated weights.
Throws:
DiscriminantAnalysis.SumOfWeightsNegException - is thrown when the sum of the weights have become negative.
DiscriminantAnalysis.EmptyGroupException - is thrown when there are no observations in a group.
DiscriminantAnalysis.CovarianceSingularException - is thrown when the variance-Covariance matrix is singular.

update

public void update(double[][] x,
                   int groupIndex)
            throws DiscriminantAnalysis.SumOfWeightsNegException,
                   DiscriminantAnalysis.EmptyGroupException,
                   DiscriminantAnalysis.CovarianceSingularException
Processes a set of observations and performs a linear or quadratic discriminant function analysis among the several known groups.

Parameters:
x - A double matrix containing the observations. The first nVariables columns correspond to the variables, excluding the groupIndex column.
groupIndex - An int containing the column index of x in which the group numbers are stored. The groups must be numbered 1,2, ..., nGroups.
Throws:
DiscriminantAnalysis.SumOfWeightsNegException - is thrown when the sum of the weights have become negative.
DiscriminantAnalysis.EmptyGroupException - is thrown when there are no observations in a group.
DiscriminantAnalysis.CovarianceSingularException - is thrown when the variance-Covariance matrix is singular.

update

public void update(double[][] x,
                   int[] varIndex)
            throws DiscriminantAnalysis.SumOfWeightsNegException,
                   DiscriminantAnalysis.EmptyGroupException,
                   DiscriminantAnalysis.CovarianceSingularException
Processes a set of observations and performs a linear or quadratic discriminant function analysis among the several known groups.

Parameters:
x - A double matrix containing the observations. The columns indicated in varIndex correspond to the variables, and the last column (column nVariables) contains the group numbers. The groups must be numbered 1,2, ..., nGroups.
varIndex - An int array containing the column indices in x that correspond to the variables to be used in the analysis.
Throws:
DiscriminantAnalysis.SumOfWeightsNegException - is thrown when the sum of the weights have become negative.
DiscriminantAnalysis.EmptyGroupException - is thrown when there are no observations in a group.
DiscriminantAnalysis.CovarianceSingularException - is thrown when the variance-Covariance matrix is singular.

update

public void update(double[][] x,
                   int[] varIndex,
                   double[] frequencies,
                   double[] weights)
            throws DiscriminantAnalysis.SumOfWeightsNegException,
                   DiscriminantAnalysis.EmptyGroupException,
                   DiscriminantAnalysis.CovarianceSingularException
Processes a set of observations and associated frequencies and weights then performs a linear or quadratic discriminant function analysis among the several known groups.

Parameters:
x - A double matrix containing the observations. The columns indicated in varIndex correspond to the variables, and the last column (column nVariables) contains the group numbers. The groups must be numbered 1,2, ..., nGroups.
varIndex - An int array containing the column indices in x that correspond to the variables to be used in the analysis.
frequencies - A double array containing the associated frequencies.
weights - A double array containing the associated weights.
Throws:
DiscriminantAnalysis.SumOfWeightsNegException - is thrown when the sum of the weights have become negative.
DiscriminantAnalysis.EmptyGroupException - is thrown when there are no observations in a group.
DiscriminantAnalysis.CovarianceSingularException - is thrown when the variance-Covariance matrix is singular.

update

public void update(double[][] x,
                   int groupIndex,
                   double[] frequencies,
                   double[] weights)
            throws DiscriminantAnalysis.SumOfWeightsNegException,
                   DiscriminantAnalysis.EmptyGroupException,
                   DiscriminantAnalysis.CovarianceSingularException
Processes a set of observations and associated frequencies and weights then performs a linear or quadratic discriminant function analysis among the several known groups.

Parameters:
x - A double matrix containing the observations. The first nVariables columns correspond to the variables, excluding the groupIndex column.
groupIndex - An int containing the column index of x in which the group numbers are stored. The groups must be numbered 1,2, ..., nGroups.
frequencies - A double array containing the associated frequencies.
weights - A double array containing the associated weights.
Throws:
DiscriminantAnalysis.SumOfWeightsNegException - is thrown when the sum of the weights have become negative.
DiscriminantAnalysis.EmptyGroupException - is thrown when there are no observations in a group.
DiscriminantAnalysis.CovarianceSingularException - is thrown when the variance-Covariance matrix is singular.

update

public void update(double[][] x,
                   int groupIndex,
                   int[] varIndex)
            throws DiscriminantAnalysis.SumOfWeightsNegException,
                   DiscriminantAnalysis.EmptyGroupException,
                   DiscriminantAnalysis.CovarianceSingularException
Processes a set of observations and performs a linear or quadratic discriminant function analysis among the several known groups.

Parameters:
x - A double matrix containing the observations. The columns indicated in varIndex correspond to the variables, and groupIndex column contains the group numbers.
groupIndex - An int containing the column index of x in which the group numbers are stored. The groups must be numbered 1,2, ..., nGroups.
varIndex - An int array containing the column indices in x that correspond to the variables to be used in the analysis.
Throws:
DiscriminantAnalysis.SumOfWeightsNegException - is thrown when the sum of the weights have become negative.
DiscriminantAnalysis.EmptyGroupException - is thrown when there are no observations in a group.
DiscriminantAnalysis.CovarianceSingularException - is thrown when the variance-Covariance matrix is singular.

update

public void update(double[][] x,
                   int groupIndex,
                   int[] varIndex,
                   double[] frequencies,
                   double[] weights)
            throws DiscriminantAnalysis.SumOfWeightsNegException,
                   DiscriminantAnalysis.EmptyGroupException,
                   DiscriminantAnalysis.CovarianceSingularException
Processes a set of observations and associated frequencies and weights then performs a linear or quadratic discriminant function analysis among the several known groups.

Parameters:
x - A double matrix containing the observations. The columns indicated in varIndex correspond to the variables, and groupIndex column contains the group numbers.
groupIndex - An int containing the column index of x in which the group numbers are stored. The groups must be numbered 1,2, ..., nGroups.
varIndex - An int array containing the column indices in x that correspond to the variables to be used in the analysis.
frequencies - A double array containing the associated frequencies.
weights - A double array containing the associated weights.
Throws:
DiscriminantAnalysis.SumOfWeightsNegException - is thrown when the sum of the weights have become negative.
DiscriminantAnalysis.EmptyGroupException - is thrown when there are no observations in a group.
DiscriminantAnalysis.CovarianceSingularException - is thrown when the variance-Covariance matrix is singular.

JMSLTM Numerical Library 5.0.1

Copyright © 1970-2008 Visual Numerics, Inc.
Built July 8 2008.