Click or drag to resize
DiscriminantAnalysis Class
Performs a linear or a quadratic discriminant function analysis among several known groups.
Inheritance Hierarchy
SystemObject
  Imsl.StatDiscriminantAnalysis

Namespace: Imsl.Stat
Assembly: ImslCS (in ImslCS.dll) Version: 6.5.2.0
Syntax
[SerializableAttribute]
public class DiscriminantAnalysis

The DiscriminantAnalysis type exposes the following members.

Constructors
  NameDescription
Public methodDiscriminantAnalysis
Constructs a DiscriminantAnalysis.
Top
Methods
  NameDescription
Public methodClassify(Double)
Classify a set of observations using the linear or quadratic discriminant functions generated during the training process.
Public methodClassify(Double, Int32)
Classify a set of observations using the linear or quadratic discriminant functions generated during the training process.
Public methodClassify(Double, Int32, Double)
Classify a set of observations and associated frequencies and weights using the linear or quadratic discriminant functions generated during the training process.
Public methodClassify(Double, Int32, Int32)
Classify a set of observations and compare against known groups using the linear or quadratic discriminant functions generated during the training process.
Public methodClassify(Double, Int32, Int32, Double)
Classify a set of observations and associated frequencies and weights using the linear or quadratic discriminant functions generated during the training process.
Public methodClassify(Double, Int32, Int32, Int32, Double)
Classify a set of observations, associated frequencies and weights, and compare against known groups using the linear or quadratic discriminant functions generated during the training process.
Public methodDowndate(Double, Int32)
Removes a set of observations from the discriminant functions.
Public methodDowndate(Double, Int32, Int32)
Removes a set of observations from the discriminant functions.
Public methodDowndate(Double, Int32, Int32, Double)
Removes a set of observations and associated frequencies and weights from the discriminant functions.
Public methodDowndate(Double, Int32, Int32, Int32, Double)
Removes a set of observations and associated frequencies and weights from the discriminant functions.
Public methodEquals
Determines whether the specified object is equal to the current object.
(Inherited from Object.)
Protected methodFinalize
Allows an object to try to free resources and perform other cleanup operations before it is reclaimed by garbage collection.
(Inherited from Object.)
Public methodGetClassMembership
Returns the group number to which the observation was classified.
Public methodGetClassTable
Returns the classification table.
Public methodGetCoefficients
Returns the linear discriminant function coefficients.
Public methodGetCovariance
Returns the array of covariances.
Public methodGetGroupCounts
Returns the group counts.
Public methodGetHashCode
Serves as a hash function for a particular type.
(Inherited from Object.)
Public methodGetMahalanobis
Returns the Mahalanobis distances between the group means.
Public methodGetMeans
Returns the variable means.
Public methodGetPrior
Returns the prior probabilities.
Public methodGetProbability
Returns the posterior probabilities for each observation.
Public methodGetStatistics
Returns statistics.
Public methodGetType
Gets the Type of the current instance.
(Inherited from Object.)
Protected methodMemberwiseClone
Creates a shallow copy of the current Object.
(Inherited from Object.)
Public methodSetPrior
Specifies user supplied prior probabilities.
Public methodToString
Returns a string that represents the current object.
(Inherited from Object.)
Top
Properties
  NameDescription
Public propertyClassificationMethod
The classification method.
Public propertyCovarianceComputation
The type of covariance matrices to be computed.
Public propertyDiscriminationMethod
The discrimination method.
Public propertyNumberOfRowsMissing
The number of rows of data encountered containing missing values (Double.NaN).
Public propertyPriorType
The type of prior probabilities to be calculated.
Top
Remarks

DiscriminantAnalysis allows linear or a quadratic discrimination and the use of either reclassification, split sample, or the leaving-out-one methods in order to evaluate the rule. One or more observations can be added to the rule during each invocation of the Update method.

DiscriminantAnalysis results in the measure of distance between the groups, (see GetMahalanobis method), a table summarizing the classification results, (see GetClassTable), a matrix containing the posterior probabilities of group membership for each classified observation, (see GetProbability), the within-sample means, (see GetMeans) and covariance matrices computed from their LU factorizations, (see GetCovariance). The linear discriminant function coefficients are also computed, (see GetCoefficients method).

All observations can be input during one call to the Update method; this has the advantage of simplicity. Alternatively, one or more rows of observations can be input during separate calls to Update. This does not require all observations be memory resident, a significant advantage with large data sets. Note, however, to classify the same data set requires a second pass of the data to the Classify method. During the first pass to the Update method the discriminant functions are computed while in the second pass to the Classify method the observations are classified. When known groups are available the method GetClassTable is useful in comparing how well the alogorithm classifies. Multiple calls to the Classify method are also allowed. The class table, GetClassTable, is an accumulation of all observations classified. The class membership and probabilities, returned in GetClassMembership and GetProbability, will contain the membership for each observation from the most recent invocation of the Classify method.

Pooled only and pooled with group covariance computation cannot be mixed. By default, both pooled and group covariance matrices will be computed. An InvalidOperationException will be thrown if an attempt is made to change the covariance computation after the first call to the Update method. See the CovarianceComputation method for more details on specifying the covariance computation.

The within-group means are updated for all valid observations in x. Observations with invalid group numbers are ignored, as are observations with missing values (Double.NaN). The LU factorization of the covariance matrices are updated by adding (or deleting) observations via Givens rotations. See the Downdate method to delete observations.

During the algorithm's training process, or each invocation of the Update method, each observation in x is added to the means and the factorizations of the covariance matrices. Statistics of interest are computed: the linear discriminant functions, the prior probabilities, the log of the determinant of each of the covariance matrices, and a test statistic for testing that all of the within-group covariance matrices are equal. The matrix of Mahalanobis distances, which consists of the distances between the groups, is computed via the pooled covariance matrix when linear discrimination is specified. The row covariance matrix is used when the discrimination is quadratic. Covariance matrices are defined as follows. Let N_i denote the sum of the frequencies of the observations in group i, and let M_i denote the number of observations in group i. Then, if S_i denotes the within-group i covariance matrix,

S_i = \frac{1}{N_i - 1} \sum_{j=1}^{M_i} w_j f_j (x_j - \overline{x})(x_j - \overline{x})^T
where w_j is the weight of the j-th observation in group i, f_j is its frequency, x_j is the j-th observation column vector (in group i), and \overline{x} denotes the mean vector of the observations in group i. The mean vectors are computed as
\overline{x} = \frac{1}{W_i} \sum_{j=1}^{M_i} w_j f_j x_j
where
W_i = \sum_{j=1}^{M_i} w_j f_j
Given the means and the covariance matrices, the linear discriminant function for group i is computed as:
z_i = \ln(p_i)-0.5\overline{x_i}^T S_{p}^{-1} \overline{x_i} + x^T S_{p}^{-1} \overline{x_i}
where \ln(p_i) is the natural log of the prior probability for the i-th group, x is the observation to be classified, and S_p denotes the pooled covariance matrix.

Let S denote either the pooled covariance matrix or one of the within-group covariance matrices S_i. (S will be the pooled covariance matrix in linear discrimination, and S_i otherwise.) The Mahalanobis distance between group i and group j is computed as:

D_{ij}^{2} = (\overline{x_i} - \overline{x_j})^T S^{-1} (\overline{x_i} - \overline{x_j})

Finally, the asymptotic chi-squared test for the equality of covariance matrices is computed as follows (Morrison 1976, page 252):

\gamma = C^{-1} \sum_{i=1}^{k} n_i \{ ln( \left| S_p \right| ) - ln( \left| S_i \right| ) \}
where n_i is the number of degrees of freedom in the i-th sample covariance matrix, k is the number of groups, and
C^{-1} = \frac{1-2p^2 + 3p - 1}{6(p + 1)(k - 1)} \left(\sum_{i=1}^{k} \frac{1}{n_i} - \frac{1}{\sum_{j}n_j} \right)
where p is the number of variables.

The estimated posterior probability of each observation x belonging to group i is computed using the prior probabilities and the sample mean vectors and estimated covariance matrices under a multivariate normal assumption. Under quadratic discrimination, the within-group covariance matrices are used to compute the estimated posterior probabilities. The estimated posterior probability of an observation x belonging to group i is

\hat{q_i}(x) = \frac{e^{-\frac{1}{2}D_{i}^{2}(x)}}{\sum_{j=1}^{k} e^{-\frac{1}{2}D_{j}^{2}(x)}}
where
D_{i}^{2}(x) = \left\{ \begin{array}{ll} 
            (x - \overline{x_i})^T S_{i}^{-1} (x - \overline{x_i}) + ln \left|S_i \right| - 2 ln(p_i) & \mbox{linear \; or \; quadratic, pooled, group}  \\ 
            (x - \overline{x_i})^T S_{p}^{-1} (x - \overline{x_i}) - 2 ln(p_i) & \mbox{linear, \; pooled} \end{array} \right.

For the leaving-out-one method of classification, the sample mean vector and sample covariance matrices in the formula for

D_{i}^{2}(x)
are adjusted so as to remove the observation x from their computation. For linear discrimination, the linear discriminant function coefficients are actually used to compute the same posterior probabilities.

Using the posterior probabilities, each observation in x is classified into a group; the result is tabulated in the matrix returned by GetClassTable and saved in the vector returned by GetClassMembership. If a group variable is provided and the group number is out of range, the classification table is not altered at this stage. If the reclassification method is specified, then all observations with no missing values are classified. When the leaving-out-one method is used, observations with invalid group numbers, weights, frequencies or classification variables are not classified. Regardless of the frequency, a 1 is added (or subtracted) from the classification table for each row of x that is classified and contains a valid group number. When the leaving-out-one method is used, adjustment is made to the posterior probabilities to remove the effect of the observation in the classification rule. In this adjustment, each observation is presumed to have a weight of w_j and a frequency of 1.0. See Lachenbruch (1975, page 36) for the required adjustment.

See Also