DiscriminantAnalysis Class

Performs a linear or a quadratic discriminant function analysis among several known groups and the use of either reclassification, split sample, or the leaving-out-one methods in order to evaluate the rule.

For a list of all members of this type, see DiscriminantAnalysis Members.

System.Object
Imsl.Stat.DiscriminantAnalysis

public class DiscriminantAnalysis

Thread Safety

Public static (Shared in Visual Basic) members of this type are safe for multithreaded operations. Instance members are not guaranteed to be thread-safe.

Remarks

Class DiscriminantAnalysis performs discriminant function analysis using either linear or quadratic discrimination. The output from DiscriminantAnalysis includes a measure of distance between the groups, a table summarizing the classification results, a matrix containing the posterior probabilities of group membership for each observation, and the within-sample means and covariance matrices. The linear discriminant function coefficients are also computed.

All observations are input during one call to DiscriminantAnalysis, a method of operation that has the advantage of simplicity.

All observations in x are used to compute the means. The covariance matrices are factored. Requested statistics of interest are computed: the linear discriminant functions, the prior probabilities, the log of the determinant of each of the covariance matrices, a test statistic for testing that all of the within-group covariance matrices are equal, and a matrix of Mahalanobis distances between the groups. The matrix of Mahalanobis distances is computed via the pooled covariance matrix when linear discrimination is specified, the row covariance matrix is used when the discrimination is quadratic. Covariance matrices are defined as follows. Let denote the sum of the frequencies of the observations in group i, and let denote the number of observations in group i. Then, if denotes the within-group i covariance matrix,

$S_i = \frac{1}{N_i - 1} \sum_{j=1}^{M_i} w_j f_j (x_j - \overline{x})(x_j - \overline{x})^T$

where

is the weight of the j-th observation in group i,

is its frequency,

is the j-th observation column vector (in group i), and $\overline{x}$ denotes the mean vector of the observations in group i. The mean vectors are computed as

$\overline{x} = \frac{1}{W_i} \sum_{j=1}^{M_i} w_j f_j x_j$

where

$W_i = \sum_{j=1}^{M_i} w_j f_j$

Given the means and the covariance matrices, the linear discriminant function for group i is computed as:

$z_i = ln(p_i)-0.5\overline{x_i}^T S_{p}^{-1} \overline{x_i} + x^T S_{p}^{-1} \overline{x_i}$

where

is the natural log of the prior probability for the i-th group, x is the observation to be classified, and

denotes the pooled covariance matrix.

Let S denote either the pooled covariance matrix or one of the within-group covariance matrices . ( will be the pooled covariance matrix in linear discrimination, and otherwise.) The Mahalanobis distance between group i and group j is computed as:

$D_{ij}^{2} = (\overline{x_i} - \overline{x_j})^T S^{-1} (\overline{x_i} - \overline{x_j})$

Finally, the asymptotic chi-squared test for the equality of covariance matrices is computed as follows (Morrison 1976, page 252):

$\gamma = C^{-1} \sum_{i=1}^{k} n_i \{ ln( \left| S_p \right| ) - ln( \left| S_i \right| ) \}$

where

is the number of degrees of freedom in the i-th sample covariance matrix,

is the number of groups, and

$C^{-1} = 1-\frac{2p^2 + 3p - 1}{6(p + 1)(k - 1)} \left(\sum_{i=1}^{k} \frac{1}{n_i} - \frac{1}{\Sigma_{j}n_j} \right)$

where

is the number of variables.

The estimated posterior probability of each observation x belonging to group i is computed using the prior probabilities and the sample mean vectors and estimated covariance matrices under a multivariate normal assumption. Under quadratic discrimination, the within-group covariance matrices are used to compute the estimated posterior probabilities. The estimated posterior probability of an observation x belonging to group i is

$\hat{q_i}(x) = \frac{e^{-\frac{1}{2}D_{i}^{2}(x)}}{\sum_{j=1}^{k} e^{-\frac{1}{2}D_{j}^{2}(x)}}$

where

$D_{i}^{2}(x) = \left\{ \begin{array}{ll} (x - \overline{x_i})^T S_{i}^{-1} (x - \overline{x_i}) + ln \left|S_i \right| - 2 ln(p_i) & LINEAR \; or \; QUADRATIC \\ (x - \overline{x_i})^T S_{p}^{-1} (x - \overline{x_i}) - 2 ln(p_i) & LINEAR \; POOLED \end{array} \right.$

For the leaving-out-one method of classification, the sample mean vector and sample covariance matrices in the formula for

$D_{i}^{2}(x)$

are adjusted so as to remove the observation x from their computation. For linear discrimination, the linear discriminant function coefficients are actually used to compute the same posterior probabilities.

Using the posterior probabilities, each observations in X is classified into a group; the result is tabulated in the matrix CLASS and saved in the vector ICLASS. CLASS is not altered at this stage if X(i, IGRP) contains a group number that is out of range. If the reclassification method is specified, then all observations with no missing values in the nVariables classification variables are classified. When the leaving-out-one method is used, observations with invalid group numbers, weights, frequencies or classification variables are not classified. Regardless of the frequency, a 1 is added (or subtracted) from CLASS for each row of X that is classified and contains a valid group number. When the leaving-out-one method is used, adjustment is made to the posterior probabilities to remove the effect of the observation in the classification rule. In this adjustment, each observation is presumed to have a weight of X(i, IWT), if and a frequency of 1.0. See Lachenbruch (1975, page 36) for the required adjustment.

Finally, upon completion, the covariance matrices are computed from their LU factorizations.

Requirements

Namespace: Imsl.Stat

Assembly: ImslCS (in ImslCS.dll)

DiscriminantAnalysis Class

Thread Safety

Remarks

Requirements

See Also