Namespace:
Imsl.Stat
Assembly:
ImslCS (in ImslCS.dll) Version: 6.5.0.0
Syntax
C# |
---|
[SerializableAttribute] public class DiscriminantAnalysis |
Visual Basic (Declaration) |
---|
<SerializableAttribute> _ Public Class DiscriminantAnalysis |
Visual C++ |
---|
[SerializableAttribute] public ref class DiscriminantAnalysis |
Remarks
DiscriminantAnalysis allows linear or a quadratic discrimination and the use of either reclassification, split sample, or the leaving-out-one methods in order to evaluate the rule. One or more observations can be added to the rule during each invocation of the Update method.
DiscriminantAnalysis results in the measure of distance between the groups, (see GetMahalanobis method), a table summarizing the classification results, (see GetClassTable), a matrix containing the posterior probabilities of group membership for each classified observation, (see GetProbability), the within-sample means, (see GetMeans) and covariance matrices computed from their LU factorizations, (see GetCovariance). The linear discriminant function coefficients are also computed, (see GetCoefficients method).
All observations can be input during one call to the Update method; this has the advantage of simplicity. Alternatively, one or more rows of observations can be input during separate calls to Update. This does not require all observations be memory resident, a significant advantage with large data sets. Note, however, to classify the same data set requires a second pass of the data to the Classify method. During the first pass to the Update method the discriminant functions are computed while in the second pass to the Classify method the observations are classified. When known groups are available the method GetClassTable is useful in comparing how well the alogorithm classifies. Multiple calls to the Classify method are also allowed. The class table, GetClassTable, is an accumulation of all observations classified. The class membership and probabilities, returned in GetClassMembership and GetProbability, will contain the membership for each observation from the most recent invocation of the Classify method.
Pooled only and pooled with group covariance computation cannot be mixed. By default, both pooled and group covariance matrices will be computed. An InvalidOperationException will be thrown if an attempt is made to change the covariance computation after the first call to the Update method. See the CovarianceComputation method for more details on specifying the covariance computation.
The within-group means are updated for all valid observations in x. Observations with invalid group numbers are ignored, as are observations with missing values (Double.NaN). The LU factorization of the covariance matrices are updated by adding (or deleting) observations via Givens rotations. See the Downdate method to delete observations.
During the algorithm's training process, or each invocation of the
Update method, each observation in x is added to
the means and the factorizations of the covariance matrices. Statistics of
interest are computed: the linear discriminant functions, the prior
probabilities, the log of the determinant of each of the covariance matrices,
and a test statistic for testing that all of the within-group covariance
matrices are equal. The matrix of Mahalanobis distances, which consists of the
distances between the groups, is computed via the pooled covariance matrix
when linear discrimination is specified. The row covariance matrix is used
when the discrimination is quadratic.
Covariance matrices are defined as follows. Let
denote the sum of the frequencies of the observations in group i, and
let
denote the number of observations in group
i. Then, if
denotes the within-group
i covariance matrix,










Let S denote either the pooled covariance matrix or one of the
within-group covariance matrices . (S
will be the pooled covariance matrix in linear discrimination, and
otherwise.) The Mahalanobis distance between group
i and group j is computed as:

Finally, the asymptotic chi-squared test for the equality of covariance matrices is computed as follows (Morrison 1976, page 252):





The estimated posterior probability of each observation x belonging to group i is computed using the prior probabilities and the sample mean vectors and estimated covariance matrices under a multivariate normal assumption. Under quadratic discrimination, the within-group covariance matrices are used to compute the estimated posterior probabilities. The estimated posterior probability of an observation x belonging to group i is


For the leaving-out-one method of classification, the sample mean vector and sample covariance matrices in the formula for

Using the posterior probabilities, each observation in x is
classified into a group; the result is tabulated in the matrix returned by
GetClassTable and saved in the vector returned by
GetClassMembership. If a group variable is provided and the
group number is out of range, the classification table is not altered at
this stage. If the reclassification method is specified, then all
observations with no missing values are classified. When the leaving-out-one
method is used, observations with invalid group numbers, weights, frequencies
or classification variables are not classified. Regardless of the frequency,
a 1 is added (or subtracted) from the classification table for each row of
x that is classified and contains a valid group number.
When the leaving-out-one method is used, adjustment is made to the posterior
probabilities to remove the effect of the observation in the classification
rule. In this adjustment, each observation is presumed to have a weight of
and a frequency of 1.0. See Lachenbruch (1975, page 36)
for the required adjustment.