discriminantAnalysis

Performs a linear or a quadratic discriminant function analysis among several known groups.

Synopsis

discriminantAnalysis (nRows, nVariables, x, nGroups)

Required Arguments

int nRows (Input)
Number of rows of x to be processed.
int nVariables (Input)
Number of variables to be used in the discrimination.
float x[[]] (Input)
Array of size nRows by nVariables + 1 containing the data. The first nVariables columns correspond to the variables, and the last column (column nVariables) contains the group numbers. The groups must be numbered 1, 2, …, nGroups.
int nGroups (Input)
Number of groups in the data.

Optional Arguments

xIndices, int igrp, int ind[], int ifrq, int iwt (Input)

Each of the four arguments contains indices indicating column numbers of x in which particular types of data are stored.

Parameter igrp contains the index for the column of x in which the group numbers are stored.

Parameter ind contains the indices of the variables to be used in the analysis.

Parameters ifrq and iwt contain the column numbers of x in which the frequencies and weights, respectively, are stored. Set ifrq = −1 if there will be no column for frequencies. Set iwt = −1 if there will be no column for weights. Weights are rounded to the nearest integer. Negative weights are not allowed.

Defaults: igrp = nVariables, ind[] = 0, 1, …, nVariables − 1, ifrq = −1, and iwt = −1

method, int (Input)

Method of discrimination. The method chosen determines whether linear or quadratic discrimination is used, whether the group covariance matrices are computed (the pooled covariance matrix is always computed), and whether the leaving-out-one or the reclassification method is used to classify each observation.

method discrimination method covariances computed classification method
1 linear pooled, group reclassification
2 quadratic pooled, group reclassification
3 linear pooled reclassification
4 linear pooled, group leaving-out-one
5 quadratic pooled, group leaving-out-one
6 linear pooled leaving-out-one

In the leaving-out-one method of classification, the posterior probabilities are adjusted so as to eliminate the effect of the observation from the sample statistics prior to its classification. In the classification method, the effect of the observation is not eliminated from the classification function.

When optional argument ido is specified, the following rules for mixing methods apply; Methods 1, 2, 4, and 5 can be intermixed, as can methods 3 and 6. Methods 1, 2, 4, and 5 cannot be intermixed with methods 3 and 6.

Default: method = 1

ido, int (Input)

Processing option. See Comments 3 and 4 for more information.

ido Action
0 This is the only invocation; all the data are input at once. (Default)
1 This is the first invocation with this data; additional calls will be made. Initialization and updating for the nRows observations of x will be performed.
2 This is an intermediate invocation; updating for the nRows observations of x will be performed.
3 All statistics are updated for the nRows observations. The discriminant functions and other statistics are computed.
4 The discriminant functions are used to classify each of the nRows observations of x.
5 The covariance matrices are computed, and workspace is released. No further call to discriminantAnalysis with ido greater than 1 should be made without first calling discriminantAnalysis with ido = 1.
6 Workspace is released. No further calls to discriminantAnalysis with ido greater than 1 should be made without first calling discriminantAnalysis with ido = 1. Invo­cation with this option is not required if a call has already been made with ido = 5.

Default: ido = 0

rowsAdd (Input)

or

rowsDelete (Input)

By default (or if rowsAdd is specified), then the observations in x are added to the discriminant statistics. If rowsDelete is specified, then the observations are deleted.

If ido = 0, these optional arguments are ignored (data is always added if there is only one invocation).

priorEqual (Input)

or

priorProportional (Input)

or

priorInput, float[] (Input)

By default, (or if priorEqual is specified), equal prior probabilities are calculated as 1.0/nGroups.

If priorProportional is specified, prior probabilities are calculated to be proportional to the sample size in each group.

If priorInput is specified, then array priorInput is an array of length nGroups containing the prior probabilities for each group, such that the sum of all prior probabilities is equal to 1.0. Prior probabilities are not used if ido is equal to 1, 2, 5, or 6.

priorOutput (Output)
An array of length nGroups containing the most recently calculated or input prior probabilities. If priorProportional is specified, every element of priorOutput is equal to −1 until a call is made with ido equal to 0 or 3, at which point the priors are calculated. Note that subsequent calls to discriminantAnalysis with priorProportional specified, and ido not equal to 0 or 3 will result in the elements of priorOutput being reset to −1.
groupCounts (Output)
An integer array of length nGroups containing the number of observations in each group. Array groupCounts is updated when ido is equal to 0, 1, or 2.
means (Output)

An array of size nGroups by nVariables. The i-th row of means contains the group i variable means. Array means is updated when ido is equal to 0, 1, 2, or 5. The means are unscaled until a call is made with ido = 5, where the unscaled means are calculated as \(\Sigma w_i f_i x_i\) and the scaled means as

\[\frac{\Sigma w_i f_i x_i}{\Sigma w_i f_i}\]

where \(x_i\) is the value of the i-th observation, \(w_i\) is the weight of the i-th observation, and \(f_i\) is the frequency of the i-th observation.

cov (Output)
An array of size g by nVariables by nVariables containing the within-group covariance matrices (methods 1, 2, 4, and 5 only) as the first g-1 matrices, and the pooled covariance matrix as the g-th matrix (that is, the first nVariables × nVariables elements comprise the group 1 covariance matrix, the next nVariables × nVariables elements comprise the group 2 covariance, …, and the last nVariables × nVariables elements comprise the pooled covariance matrix). If method is 3 or 6 then g is equal to 1. Otherwise, g is equal to nGroups + 1. Argument cov is updated when ido is equal to 0, 1, 2, 3, or 5.
coef (Output)

An array of size nGroups by (nVariables + 1) containing the linear discriminant coefficients. The first column of coefficients contains the constant term, and the remaining columns contain the variable coefficients. Row i − 1 of coefficients corresponds to group i, for i = 1, 2, … nGroups. Array coefficients is always computed as the linear discriminant function coefficients even when quadratic discrimination is specified. Specifically, given the linear discriminant function

\[z_i = \ln\left(p_i\right) - 0.5 \overline{x}_i^T S_p^{-1} \overline{x}_i + x^T S_p^{-1} \overline{x}_i\]

the intercept \(\ln\left(p_i\right)-0.5\overline{x}_i^TS_p^{-1}\overline{x}_i\) is assigned to coefficients[i×(nVariables+1)] and the j‑th element of \(S_p^{-1}\overline{x}_i\) is assigned to coefficients[i×(nVariables+1)+j], where i = 0, …, nGroups-1 and j=1, …, nVariables+1. Array coefficients is updated when ido is equal to 0 or 3.

classMembership (Output)

An integer array of length nRows containing the group to which the observation was classified. Array classMembership is updated when ido is equal to 0 or 4.

If an observation has an invalid group number, frequency, or weight when the leaving-out-one method has been specified, then the observation is not classified and the corresponding elements of classMembership (and prob, see prob) are set to zero.

classTable (Output)
An array of size nGroups by nGroups containing the classification table. Array classTable is updated when ido is equal to 0, 1, or 4. Each observation that is classified and has a group number 1.0, 2.0, …, nGroups is entered into the table. The rows of the table correspond to the known group membership. The columns refer to the group to which the observation was classified. Classification results accumulate with each call to discriminantAnalysis with ido equal to 4. For example, if two calls with ido equal to 4 are made, the elements in classTable sum to the total number of valid observations in the two calls.
prob (Output)
An array of size nRows by nGroups containing the posterior probabilities for each observation. Argument prob is updated when ido is equal to 0 or 4.
mahalanobis (Output)

An array of size nGroups by nGroups containing the Mahalanobis distances

\[D_{ij}^2\]

between the group means. Argument mahalanobis is updated when ido is equal to 0 or 3.

For linear discrimination, the Mahalanobis distance is computed using the pooled covariance matrix. Otherwise, the Mahalanobis distance

\[D_{ij}^2\]

between group means i and j is computed using the within covariance matrix for group i in place of the pooled covariance matrix.

stats (Output)
An array of length 4 + 2 × (nGroups + 1) containing various statistics of interest. Array stats is updated when ido is equal to 0, 2, 3, or 5. The first element of stats is the sum of the degrees of freedom for the within-covariance matrices. The second, third, and fourth elements of stats correspond to the chi-squared statistic, its degrees of freedom, and the probability of a greater chi-squared, respectively, of a test of the homogeneity of the within-covariance matrices (not computed if method is equal to 3 or 6). The fifth through 5 + nGroups elements of stats contain the log of the determinants of each group’s covariance matrix (not computed if method is equal to 3 or 6) and of the pooled covariance matrix (element 4 + nGroups). Finally, the last nGroups + 1 elements of stats contain the sum of the weights within each group, and in the last position, the sum of the weights in all groups.
nRowsMissing (Output)

Number of rows of data encountered in calls to discriminantAnalysis containing missing values (NaN) for the classification, group, weight, and/or frequency variables. If a row of data contains a missing value (NaN) for any of these variables, that row is excluded from the computations.

Array nRowsMissing is updated when ido is equal to 0, 1, 2, or 3.

Comments

  1. Common choices for the Bayesian prior probabilities are given by:

    priorInput[i] = 1.0∕nGroups (equal priors)

    priorInput[i] = groupCounts[i]nRows (proportional priors)

    priorInput[i] = Past history or subjective judgment.

    In all cases, the priors should sum to 1.0.

  2. Two passes of the data are made. In the first pass, the statistics required to compute the discriminant functions are obtained (ido equal to 1, 2, and 3). In the second pass, the discriminant functions are used to classify the observations. When ido is equal to 0, all of the data are memory resident, and both passes are made in one call to discriminantAnalysis. When ido > 0 (optional argument ido is specified), a third call to discriminantAnalysis involving no data is required with ido equal to 5 or 6.

  3. Here are a few rules and guidelines for the correct value of ido in a series of calls:

    1. Calls with ido = 0 or ido = 1 may be made at any time, subject to rule 2. These calls indicate that a new analysis is to begin, and therefore allocate memory and destroy all statistics from previous calls.

    2. Each series of calls to discriminantAnalysis which begins with ido = 1 must end with ido equal to 5 or 6 to ensure the proper release of workspace, subject to rule 3.

    3. ido may not be 4 or 5 before a call with ido = 3 has been made.

    4. ido may not be 2, 3, 4, 5, or 6

      1. Immediately after a call with ido = 0.
      2. Before a call with ido = 1 has been made.

      iii) Immediately after a call with ido equal to 5 or 6 has been made.

      The following is a valid sequence of ido’s:

ido Explanation
0 Data Set A: Perform a complete analysis. All data to be used in the analysis must be present in x. Since cleanup of workspace is automatic for ido = 0, no further calls are necessary.
1 Data Set B: Begin analysis. The nRows observations in x are used for initialization.
2 Data Set B: Continue analysis. New observations placed in x are added to (or deleted from, see rowsDelete) the analysis.
2 Data Set B: Continue analysis. nRows new observations placed in x are added to (or deleted from, see rowsDelete) the analysis.
3 Data Set B: Continue analysis. nRows new observations are added (or deleted) and discriminant functions and other statistics are computed.
4 Data Set B: Classification of each of the nRows observations in the current x matrix.
5 Data Set B: End analysis. Covariance matrices are computed and workspace is released. This analysis could also have been ended by choosing ido = 6
1 Data Set C: Begin analysis. Note that for this call to be valid the previous call must have been made with ido equal to 5 or 6.
3 Data Set C: Continue analysis.
4 Data Set C: Continue analysis.
3 Data Set C: Continue analysis.
6 Data Set C: End analysis.
  1. Because of the internal workspace allocation and saved variables, function discriminantAnalysis must complete the analysis of a data set before beginning processing of the next data set.

Description

Function discriminantAnalysis performs discriminant function analysis using either linear or quadratic discrimination. The output includes a measure of distance between the groups, a table summarizing the classification results, a matrix containing the posterior probabilities of group membership for each observation, and the within-sample means and covariance matrices. The linear discriminant function coefficients are also computed.

By default (or if optional argument ido is specified with ido = 0) all observations are input during one call, a method of operation that has the advantage of simplicity. Alternatively, one or more rows of observations can be input during separate calls. This method does not require that all observations be memory resident, a significant advantage with large data sets. Note, however, that the algorithm requires two passes of the data. During the first pass the discriminant functions are computed while in the second pass, the observations are classified. Thus, with the second method of operation, the data will usually need to be input twice.

Because both methods result in the same operations being performed, the algorithm is discussed as if only a few observations are input during each call. The operations performed during each call depend upon the ido parameter.

The ido = 1 step is the initialization step. “Private” internally allocated saved variables corresponding to means, classTable, and covariances are initialized to zero, and other program parameters are set (copies of these private variables are written to the corresponding output variables upon return from the function call, assuming ido values such that the results are to be returned). Parameters nRows, x, and method can be changed from one call to the next within the two sets {1, 2, 4, 5} and {3, 6} but not between these sets when ido > 1. That is, do not specify method = 1 in one call and method = 3 in another call without first making a call with ido = 1.

After initialization has been performed in the ido = 1 step, the within-group means are updated for all valid observations in x. Observations with invalid group numbers are ignored, as are observation with missing values. The LU factorization of the covariance matrices are updated by adding (or deleting) observations via Givens rotations.

The ido = 2 step is used solely for adding or deleting observations from the model as in the above paragraph.

The ido = 3 step begins by adding all observations in x to the means and the factorizations of the covariance matrices. It continues by computing some statistics of interest: the linear discriminant functions, the prior probabilities (by default, or if proportionalPriors is specified), the log of the determinant of each of the covariance matrices, a test statistic for testing that all of the within-group covariance matrices are equal, and a matrix of Mahalanobis distances between the groups. The matrix of Mahalanobis distances is computed via the pooled covariance matrix when linear discrimination is specified; the row covariance matrix is used when the discrimination is quadratic.

Covariance matrices are defined as follows: Let \(N_i\) denote the sum of the frequencies of the observations in group i and \(M_i\) denote the number of observations in group i. Then, if \(S_i\) denotes the within-group i covariance matrix,

\[S_i = \frac{1}{N_i-1} \sum_{j=1}^{M_i} w_j f_j \left(x_j - \overline{x}\right) \left(x_j - \overline{x}\right)^T\]

Where \(w_j\) is the weight of the j-th observation in group i, \(f_j\) is the frequency, \(x_j\) is the j-th observation column vector (in group i), and \(\bar{x}\) denotes the mean vector of the observations in group i. The mean vectors are computed as

\[\overline{x} = \left(\frac{1}{W_i}\right) \sum_{j=1}^{M_i} w_j f_j x_j \phantom{...}\text{where } W_i = \sum_{j=1}^{M_i} w_j f_j\]

Given the means and the covariance matrices, the linear discriminant function for group i is computed as:

\[z_i = \ln \left( p_i \right) - 0.5 \overline{x}_i^T S_p^{-1} \overline{x}_i + x^T S_p^{-1} \overline{x}_i\]

where \(\ln(p_i)\) is the natural log of the prior probability for the i-th group, x is the observation to be classified, and \(S_p\) denoted the pooled covariance matrix.

Let S denote either the pooled covariance matrix of one of the within-group covariance matrices \(S_i\). (S will be the pooled covariance matrix in linear discrimination, and \(S_i\) otherwise.) The Mahalanobis distance between group i and group j is computed as:

\[D_{ij}^2 = \left(\overline{x}_i - \overline{x}_j\right)^T S^{-1}\left(\overline{x}_i - \overline{x}_j\right)\]

Finally, the asymptotic chi-squared test for the equality of covariance matrices is computed as follows (Morrison 1976, p. 252):

\[\gamma = C^{-1} \sum_{i=1}^{k} n_i \left\{ \ln\left(\left|S_p\right|\right) - \ln \left(\left|S_i\right|\right)\right\}\]

where \(n_i\) is the number of degrees of freedom in the i-th sample covariance matrix, k is the number of groups, and

\[C^{-1} = 1 - \frac{2p^2 + 3p - 1}{6(p+1)(k-1)} \left(\sum_{i=1}^{k} \frac{1}{n_i} - \frac{1}{\Sigma_j n_j}\right)\]

where p is the number of variables.

When ido = 4, the estimated posterior probability of each observation x belonging to group i is computed using the prior probabilities and the sample mean vectors and estimated covariance matrices under a multivariate normal assumption. Under quadratic discrimination, the within-group covariance matrices are used to compute the estimated posterior probabilities. The estimated posterior probability of an observation x belonging to group i is

\[\hat{q}_i(x) = \frac{\exp \left(-0.5 D_i^2(x)\right)} {\sum\limits_{j=1}^{k} \exp \left(-0.5 D_j^2(x)\right)}\]

where

\[\begin{split}D_i^2(x) = \begin{cases} \left(x - \overline{x}_i\right)^T S_i^{-1} \left(x - \overline{x}_i\right) + \ln |S_i| - 2\ln\left(p_i\right) & \texttt{method} = 1 \text{ or } 2 \\ \left(x - \overline{x}_i\right)^T S_p^-1 \left(x - \overline{x}_i\right) - 2\ln\left(p_i\right) & \texttt{method} = 3 \end{cases}\end{split}\]

For the leaving-out-one method of classification (method equal to 4, 5 or 6), the sample mean vector and sample covariance matrices in the formula for

\[D_i^2\]

are adjusted so as to remove the observation x from their computation. For linear discrimination (method equal to 1, 3, 4, or 6), the linear discriminant function coefficients are actually used to compute the same posterior probabilities.

Using the posterior probabilities, each observation in x is classified into a group; the result is tabulated in the matrix classTable and saved in the vector classMembership. Matrix classTable is not altered at this stage if x[i][igrp] (see optional argument xIndices) contains a group number that is out of range. If the reclassification method is specified, then all observations with no missing values in the nVariables classification variables are classified. When the leaving-out-one method is used, observations with invalid group numbers, weights, frequencies, or classification variables are not classified. Regardless of the frequency, a 1 is added (or subtracted) from classTable for each row of x that is classified and contains a valid group number.

When method > 3, adjustment is made to the posterior probabilities to remove the effect of the observation in the classification rule. In this adjustment, each observation is presumed to have a weight of x[i][iwt] if iwt > −1 (and a weight of 1.0 if iwt = −1), and a frequency of 1.0. See Lachenbruch (1975, p. 36) for the required adjustment.

Finally, when ido = 5, the covariance matrices are computed from their LU factorizations. Internally allocated and saved variables are cleaned up at this step (ido equal to 5 or 6).

Examples

Example 1

The following example uses liner discrimination with equal prior probabilities on Fisher’s (1936) Iris data. This example illustrates the execution of discriminantAnalysis when one call is made (i.e. using the default of ido = 0).

from __future__ import print_function
from numpy import *
from pyimsl.stat.discriminantAnalysis import discriminantAnalysis
from pyimsl.stat.writeMatrix import writeMatrix

n_groups = 3
nvar = 4
prior_out = []
means = []
cov = []
coef = []
nrmiss = []
table = []
d2 = []
stats = []
prob = []
counts = []
cm = []

x = array([
    [5.1, 3.5, 1.4, 0.2, 1],
    [4.9, 3., 1.4, 0.2, 1],
    [4.7, 3.2, 1.3, 0.2, 1],
    [4.6, 3.1, 1.5, 0.2, 1],
    [5., 3.6, 1.4, 0.2, 1],
    [5.4, 3.9, 1.7, 0.4, 1],
    [4.6, 3.4, 1.4, 0.3, 1],
    [5., 3.4, 1.5, 0.2, 1],
    [4.4, 2.9, 1.4, 0.2, 1],
    [4.9, 3.1, 1.5, 0.1, 1],
    [5.4, 3.7, 1.5, 0.2, 1],
    [4.8, 3.4, 1.6, 0.2, 1],
    [4.8, 3., 1.4, 0.1, 1],
    [4.3, 3., 1.1, 0.1, 1],
    [5.8, 4., 1.2, 0.2, 1],
    [5.7, 4.4, 1.5, 0.4, 1],
    [5.4, 3.9, 1.3, 0.4, 1],
    [5.1, 3.5, 1.4, 0.3, 1],
    [5.7, 3.8, 1.7, 0.3, 1],
    [5.1, 3.8, 1.5, 0.3, 1],
    [5.4, 3.4, 1.7, 0.2, 1],
    [5.1, 3.7, 1.5, 0.4, 1],
    [4.6, 3.6, 1., 0.2, 1],
    [5.1, 3.3, 1.7, 0.5, 1],
    [4.8, 3.4, 1.9, 0.2, 1],
    [5., 3., 1.6, 0.2, 1],
    [5., 3.4, 1.6, 0.4, 1],
    [5.2, 3.5, 1.5, 0.2, 1],
    [5.2, 3.4, 1.4, 0.2, 1],
    [4.7, 3.2, 1.6, 0.2, 1],
    [4.8, 3.1, 1.6, 0.2, 1],
    [5.4, 3.4, 1.5, 0.4, 1],
    [5.2, 4.1, 1.5, 0.1, 1],
    [5.5, 4.2, 1.4, 0.2, 1],
    [4.9, 3.1, 1.5, 0.2, 1],
    [5., 3.2, 1.2, 0.2, 1],
    [5.5, 3.5, 1.3, 0.2, 1],
    [4.9, 3.6, 1.4, 0.1, 1],
    [4.4, 3., 1.3, 0.2, 1],
    [5.1, 3.4, 1.5, 0.2, 1],
    [5., 3.5, 1.3, 0.3, 1],
    [4.5, 2.3, 1.3, 0.3, 1],
    [4.4, 3.2, 1.3, 0.2, 1],
    [5., 3.5, 1.6, 0.6, 1],
    [5.1, 3.8, 1.9, 0.4, 1],
    [4.8, 3., 1.4, 0.3, 1],
    [5.1, 3.8, 1.6, 0.2, 1],
    [4.6, 3.2, 1.4, 0.2, 1],
    [5.3, 3.7, 1.5, 0.2, 1],
    [5., 3.3, 1.4, 0.2, 1],
    [7., 3.2, 4.7, 1.4, 2],
    [6.4, 3.2, 4.5, 1.5, 2],
    [6.9, 3.1, 4.9, 1.5, 2],
    [5.5, 2.3, 4., 1.3, 2],
    [6.5, 2.8, 4.6, 1.5, 2],
    [5.7, 2.8, 4.5, 1.3, 2],
    [6.3, 3.3, 4.7, 1.6, 2],
    [4.9, 2.4, 3.3, 1., 2],
    [6.6, 2.9, 4.6, 1.3, 2],
    [5.2, 2.7, 3.9, 1.4, 2],
    [5., 2., 3.5, 1., 2],
    [5.9, 3., 4.2, 1.5, 2],
    [6., 2.2, 4., 1., 2],
    [6.1, 2.9, 4.7, 1.4, 2],
    [5.6, 2.9, 3.6, 1.3, 2],
    [6.7, 3.1, 4.4, 1.4, 2],
    [5.6, 3., 4.5, 1.5, 2],
    [5.8, 2.7, 4.1, 1., 2],
    [6.2, 2.2, 4.5, 1.5, 2],
    [5.6, 2.5, 3.9, 1.1, 2],
    [5.9, 3.2, 4.8, 1.8, 2],
    [6.1, 2.8, 4., 1.3, 2],
    [6.3, 2.5, 4.9, 1.5, 2],
    [6.1, 2.8, 4.7, 1.2, 2],
    [6.4, 2.9, 4.3, 1.3, 2],
    [6.6, 3., 4.4, 1.4, 2],
    [6.8, 2.8, 4.8, 1.4, 2],
    [6.7, 3., 5., 1.7, 2],
    [6., 2.9, 4.5, 1.5, 2],
    [5.7, 2.6, 3.5, 1., 2],
    [5.5, 2.4, 3.8, 1.1, 2],
    [5.5, 2.4, 3.7, 1., 2],
    [5.8, 2.7, 3.9, 1.2, 2],
    [6., 2.7, 5.1, 1.6, 2],
    [5.4, 3., 4.5, 1.5, 2],
    [6., 3.4, 4.5, 1.6, 2],
    [6.7, 3.1, 4.7, 1.5, 2],
    [6.3, 2.3, 4.4, 1.3, 2],
    [5.6, 3., 4.1, 1.3, 2],
    [5.5, 2.5, 4., 1.3, 2],
    [5.5, 2.6, 4.4, 1.2, 2],
    [6.1, 3., 4.6, 1.4, 2],
    [5.8, 2.6, 4., 1.2, 2],
    [5., 2.3, 3.3, 1., 2],
    [5.6, 2.7, 4.2, 1.3, 2],
    [5.7, 3., 4.2, 1.2, 2],
    [5.7, 2.9, 4.2, 1.3, 2],
    [6.2, 2.9, 4.3, 1.3, 2],
    [5.1, 2.5, 3., 1.1, 2],
    [5.7, 2.8, 4.1, 1.3, 2],
    [6.3, 3.3, 6., 2.5, 3],
    [5.8, 2.7, 5.1, 1.9, 3],
    [7.1, 3., 5.9, 2.1, 3],
    [6.3, 2.9, 5.6, 1.8, 3],
    [6.5, 3., 5.8, 2.2, 3],
    [7.6, 3., 6.6, 2.1, 3],
    [4.9, 2.5, 4.5, 1.7, 3],
    [7.3, 2.9, 6.3, 1.8, 3],
    [6.7, 2.5, 5.8, 1.8, 3],
    [7.2, 3.6, 6.1, 2.5, 3],
    [6.5, 3.2, 5.1, 2., 3],
    [6.4, 2.7, 5.3, 1.9, 3],
    [6.8, 3., 5.5, 2.1, 3],
    [5.7, 2.5, 5., 2., 3],
    [5.8, 2.8, 5.1, 2.4, 3],
    [6.4, 3.2, 5.3, 2.3, 3],
    [6.5, 3., 5.5, 1.8, 3],
    [7.7, 3.8, 6.7, 2.2, 3],
    [7.7, 2.6, 6.9, 2.3, 3],
    [6., 2.2, 5., 1.5, 3],
    [6.9, 3.2, 5.7, 2.3, 3],
    [5.6, 2.8, 4.9, 2., 3],
    [7.7, 2.8, 6.7, 2., 3],
    [6.3, 2.7, 4.9, 1.8, 3],
    [6.7, 3.3, 5.7, 2.1, 3],
    [7.2, 3.2, 6., 1.8, 3],
    [6.2, 2.8, 4.8, 1.8, 3],
    [6.1, 3., 4.9, 1.8, 3],
    [6.4, 2.8, 5.6, 2.1, 3],
    [7.2, 3., 5.8, 1.6, 3],
    [7.4, 2.8, 6.1, 1.9, 3],
    [7.9, 3.8, 6.4, 2., 3],
    [6.4, 2.8, 5.6, 2.2, 3],
    [6.3, 2.8, 5.1, 1.5, 3],
    [6.1, 2.6, 5.6, 1.4, 3],
    [7.7, 3., 6.1, 2.3, 3],
    [6.3, 3.4, 5.6, 2.4, 3],
    [6.4, 3.1, 5.5, 1.8, 3],
    [6., 3., 4.8, 1.8, 3],
    [6.9, 3.1, 5.4, 2.1, 3],
    [6.7, 3.1, 5.6, 2.4, 3],
    [6.9, 3.1, 5.1, 2.3, 3],
    [5.8, 2.7, 5.1, 1.9, 3],
    [6.8, 3.2, 5.9, 2.3, 3],
    [6.7, 3.3, 5.7, 2.5, 3],
    [6.7, 3., 5.2, 2.3, 3],
    [6.3, 2.5, 5., 1.9, 3],
    [6.5, 3., 5.2, 2., 3],
    [6.2, 3.4, 5.4, 2.3, 3],
    [5.9, 3., 5.1, 1.8, 3]])
nrow = 150

discriminantAnalysis(nrow, nvar, x, n_groups,
                     method=3,
                     groupCounts=counts,
                     coef=coef,
                     means=means,
                     stats=stats,
                     classMembership=cm,
                     classTable=table,
                     prob=prob,
                     mahalanobis=d2,
                     cov=cov,
                     priorOutput=prior_out,
                     nRowsMissing=nrmiss,
                     priorEqual=True)

writeMatrix('counts', counts)
writeMatrix('coef', coef)
writeMatrix('Means', means)
writeMatrix('Stats', stats, column=True)
writeMatrix('Membership', cm, writeFormat="%2i")
writeMatrix('Table', table)
writeMatrix('Prob', prob)
writeMatrix('D2', d2)
writeMatrix('Covariance', cov)
writeMatrix('Prior OUT', prior_out)
print("\nnrmiss = %3i" % nrmiss[0])

Output

 
               counts
          1            2            3
         50           50           50
 
                               coef
             1            2            3            4            5
1        -86.3         23.5         23.6        -16.4        -17.4
2        -72.9         15.7          7.1          5.2          6.4
3       -104.4         12.4          3.7         12.8         21.1
 
                        Means
             1            2            3            4
1        5.006        3.428        1.462        0.246
2        5.936        2.770        4.260        1.326
3        6.588        2.974        5.552        2.026
 
     Stats
 1          147
 2  ...........
 3  ...........
 4  ...........
 5  ...........
 6  ...........
 7  ...........
 8          -10
 9           50
10           50
11           50
12          150
 
                                  Membership
 1   2   3   4   5   6   7   8   9  10  11  12  13  14  15  16  17  18  19  20
 1   1   1   1   1   1   1   1   1   1   1   1   1   1   1   1   1   1   1   1
 
21  22  23  24  25  26  27  28  29  30  31  32  33  34  35  36  37  38  39  40
 1   1   1   1   1   1   1   1   1   1   1   1   1   1   1   1   1   1   1   1
 
41  42  43  44  45  46  47  48  49  50  51  52  53  54  55  56  57  58  59  60
 1   1   1   1   1   1   1   1   1   1   2   2   2   2   2   2   2   2   2   2
 
61  62  63  64  65  66  67  68  69  70  71  72  73  74  75  76  77  78  79  80
 2   2   2   2   2   2   2   2   2   2   3   2   2   2   2   2   2   2   2   2
 
81  82  83  84  85  86  87  88  89  90  91  92  93  94  95  96  97  98  99
 2   2   2   3   2   2   2   2   2   2   2   2   2   2   2   2   2   2   2
 
100  101  102  103  104  105  106  107  108  109  110  111  112  113  114  115
  2    3    3    3    3    3    3    3    3    3    3    3    3    3    3    3
 
116  117  118  119  120  121  122  123  124  125  126  127  128  129  130  131
  3    3    3    3    3    3    3    3    3    3    3    3    3    3    3    3
 
132  133  134  135  136  137  138  139  140  141  142  143  144  145  146  147
  3    3    2    3    3    3    3    3    3    3    3    3    3    3    3    3
 
148  149  150
  3    3    3
 
                  Table
             1            2            3
1           50            0            0
2            0           48            2
3            0            1           49
 
                   Prob
               1            2            3
  1        1.000        0.000        0.000
  2        1.000        0.000        0.000
  3        1.000        0.000        0.000
  4        1.000        0.000        0.000
  5        1.000        0.000        0.000
  6        1.000        0.000        0.000
  7        1.000        0.000        0.000
  8        1.000        0.000        0.000
  9        1.000        0.000        0.000
 10        1.000        0.000        0.000
 11        1.000        0.000        0.000
 12        1.000        0.000        0.000
 13        1.000        0.000        0.000
 14        1.000        0.000        0.000
 15        1.000        0.000        0.000
 16        1.000        0.000        0.000
 17        1.000        0.000        0.000
 18        1.000        0.000        0.000
 19        1.000        0.000        0.000
 20        1.000        0.000        0.000
 21        1.000        0.000        0.000
 22        1.000        0.000        0.000
 23        1.000        0.000        0.000
 24        1.000        0.000        0.000
 25        1.000        0.000        0.000
 26        1.000        0.000        0.000
 27        1.000        0.000        0.000
 28        1.000        0.000        0.000
 29        1.000        0.000        0.000
 30        1.000        0.000        0.000
 31        1.000        0.000        0.000
 32        1.000        0.000        0.000
 33        1.000        0.000        0.000
 34        1.000        0.000        0.000
 35        1.000        0.000        0.000
 36        1.000        0.000        0.000
 37        1.000        0.000        0.000
 38        1.000        0.000        0.000
 39        1.000        0.000        0.000
 40        1.000        0.000        0.000
 41        1.000        0.000        0.000
 42        1.000        0.000        0.000
 43        1.000        0.000        0.000
 44        1.000        0.000        0.000
 45        1.000        0.000        0.000
 46        1.000        0.000        0.000
 47        1.000        0.000        0.000
 48        1.000        0.000        0.000
 49        1.000        0.000        0.000
 50        1.000        0.000        0.000
 51        0.000        1.000        0.000
 52        0.000        0.999        0.001
 53        0.000        0.996        0.004
 54        0.000        1.000        0.000
 55        0.000        0.996        0.004
 56        0.000        0.999        0.001
 57        0.000        0.986        0.014
 58        0.000        1.000        0.000
 59        0.000        1.000        0.000
 60        0.000        1.000        0.000
 61        0.000        1.000        0.000
 62        0.000        0.999        0.001
 63        0.000        1.000        0.000
 64        0.000        0.994        0.006
 65        0.000        1.000        0.000
 66        0.000        1.000        0.000
 67        0.000        0.981        0.019
 68        0.000        1.000        0.000
 69        0.000        0.960        0.040
 70        0.000        1.000        0.000
 71        0.000        0.253        0.747
 72        0.000        1.000        0.000
 73        0.000        0.816        0.184
 74        0.000        1.000        0.000
 75        0.000        1.000        0.000
 76        0.000        1.000        0.000
 77        0.000        0.998        0.002
 78        0.000        0.689        0.311
 79        0.000        0.993        0.007
 80        0.000        1.000        0.000
 81        0.000        1.000        0.000
 82        0.000        1.000        0.000
 83        0.000        1.000        0.000
 84        0.000        0.143        0.857
 85        0.000        0.964        0.036
 86        0.000        0.994        0.006
 87        0.000        0.998        0.002
 88        0.000        0.999        0.001
 89        0.000        1.000        0.000
 90        0.000        1.000        0.000
 91        0.000        0.999        0.001
 92        0.000        0.998        0.002
 93        0.000        1.000        0.000
 94        0.000        1.000        0.000
 95        0.000        1.000        0.000
 96        0.000        1.000        0.000
 97        0.000        1.000        0.000
 98        0.000        1.000        0.000
 99        0.000        1.000        0.000
100        0.000        1.000        0.000
101        0.000        0.000        1.000
102        0.000        0.001        0.999
103        0.000        0.000        1.000
104        0.000        0.001        0.999
105        0.000        0.000        1.000
106        0.000        0.000        1.000
107        0.000        0.049        0.951
108        0.000        0.000        1.000
109        0.000        0.000        1.000
110        0.000        0.000        1.000
111        0.000        0.013        0.987
112        0.000        0.002        0.998
113        0.000        0.000        1.000
114        0.000        0.000        1.000
115        0.000        0.000        1.000
116        0.000        0.000        1.000
117        0.000        0.006        0.994
118        0.000        0.000        1.000
119        0.000        0.000        1.000
120        0.000        0.221        0.779
121        0.000        0.000        1.000
122        0.000        0.001        0.999
123        0.000        0.000        1.000
124        0.000        0.097        0.903
125        0.000        0.000        1.000
126        0.000        0.003        0.997
127        0.000        0.188        0.812
128        0.000        0.134        0.866
129        0.000        0.000        1.000
130        0.000        0.104        0.896
131        0.000        0.000        1.000
132        0.000        0.001        0.999
133        0.000        0.000        1.000
134     
nrmiss =   0
   0.000        0.729        0.271
135        0.000        0.066        0.934
136        0.000        0.000        1.000
137        0.000        0.000        1.000
138        0.000        0.006        0.994
139        0.000        0.193        0.807
140        0.000        0.001        0.999
141        0.000        0.000        1.000
142        0.000        0.000        1.000
143        0.000        0.001        0.999
144        0.000        0.000        1.000
145        0.000        0.000        1.000
146        0.000        0.000        1.000
147        0.000        0.006        0.994
148        0.000        0.003        0.997
149        0.000        0.000        1.000
150        0.000        0.018        0.982
 
                   D2
             1            2            3
1          0.0         89.9        179.4
2         89.9          0.0         17.2
3        179.4         17.2          0.0
 
                     Covariance
             1            2            3            4
1       0.2650       0.0927       0.1675       0.0384
2       0.0927       0.1154       0.0552       0.0327
3       0.1675       0.0552       0.1852       0.0427
4       0.0384       0.0327       0.0427       0.0419
 
              Prior OUT
          1            2            3
     0.3333       0.3333       0.3333

Example 2

Continuing with Fisher’s Iris data, the example below computes the quadratic discriminant functions using values of ido greater than 0. In the first loop, all observations are added to the functions, one at a time. In the second loop, each of the observations is classified, one by one, using the leaving-out-one method.

from __future__ import print_function
from numpy import *
from pyimsl.stat.dataSets import dataSets
from pyimsl.stat.discriminantAnalysis import discriminantAnalysis
from pyimsl.stat.permuteMatrix import permuteMatrix, PERMUTE_COLUMNS
from pyimsl.stat.writeMatrix import writeMatrix

perm = [1, 2, 3, 4, 0]
nrow = []
ncol = []
n_groups = 3

# Retrieve the Fisher Iris Data Set
xtemp = dataSets(3, nObservations=nrow, nVariables=ncol)
nrow = nrow[0]
ncol = ncol[0]
nvar = ncol - 1

# Move the group column to end of the the matrix
x = permuteMatrix(xtemp, perm, PERMUTE_COLUMNS)

# Initialize Analysis
discriminantAnalysis(0, nvar, x, n_groups, ido=1, method=2)

# Add In Each Observation
for i in range(0, nrow):
    discriminantAnalysis(1, nvar, x[i:i + 1, :], n_groups, ido=2)

# Remove observation 0 from the analysis
discriminantAnalysis(1, nvar, x, n_groups, rowsDelete=True, ido=2)

# Add observation 0 back into the analysis
discriminantAnalysis(1, nvar, x, n_groups, ido=2)

# Compute statistics
prior_out = []
discriminantAnalysis(0, nvar, x, n_groups, priorProportional=True,
                     priorOutput=prior_out, ido=3)
writeMatrix("Prior OUT", prior_out)

# Classify One observation at a time, using proportional priors
cm = []
prob = []
for i in range(0, nrow):
    cm_temp = []
    prob_temp = []
    discriminantAnalysis(1, nvar, x[i:i + 1, :], n_groups, ido=4,
                         classMembership=cm_temp, prob=prob_temp)
    prob.append(prob_temp[0])
    cm.append(cm_temp[0])

# Compute covariance matrices and release internal workspace
cov = []
group_counts = []
coef = []
means = []
stats = []
table = []
d2 = []
nrmiss = []
discriminantAnalysis(0, nvar, x, n_groups, ido=5, cov=cov,
                     groupCounts=group_counts, coef=coef,
                     means=means, stats=stats, classTable=table,
                     mahalanobis=d2, nRowsMissing=nrmiss)

writeMatrix("Counts", group_counts)
writeMatrix("Coef", coef)
writeMatrix("Means", means)
writeMatrix("Stats", stats, column=True)
writeMatrix("Membership", cm, writeFormat="%2i")
writeMatrix("Table", table)
writeMatrix("Prob", prob)
writeMatrix("D2", d2)
writeMatrix("Covariance", cov[0])
print("\nnrmiss = %3d\n" % nrmiss[0])

Output

 
              Prior OUT
          1            2            3
     0.3333       0.3333       0.3333
 
               Counts
          1            2            3
         50           50           50
 
                               Coef
             1            2            3            4            5
1        -86.3         23.5         23.6        -16.4        -17.4
2        -72.9         15.7          7.1          5.2          6.4
3       -104.4         12.4          3.7         12.8         21.1
 
                        Means
             1            2            3            4
1        5.006        3.428        1.462        0.246
2        5.936        2.770        4.260        1.326
3        6.588        2.974        5.552        2.026
 
     Stats
 1        147.0
 2        143.8
 3         20.0
 4          0.0
 5        -13.1
 6        -10.9
 7         -8.9
 8        -10.0
 9         50.0
10         50.0
11         50.0
12        150.0
 
                                  Membership
 1   2   3   4   5   6   7   8   9  10  11  12  13  14  15  16  17  18  19  20
 1   1   1   1   1   1   1   1   1   1   1   1   1   1   1   1   1   1   1   1
 
21  22  23  24  25  26  27  28  29  30  31  32  33  34  35  36  37  38  39  40
 1   1   1   1   1   1   1   1   1   1   1   1   1   1   1   1   1   1   1   1
 
41  42  43  44  45  46  47  48  49  50  51  52  53  54  55  56  57  58  59  60
 1   1   1   1   1   1   1   1   1   1   2   2   2   2   2   2   2   2   2   2
 
61  62  63  64  65  66  67  68  69  70  71  72  73  74  75  76  77  78  79  80
 2   2   2   2   2   2   2   2   2   2   3   2   2   2   2   2   2   2   2   2
 
81  82  83  84  85  86  87  88  89  90  91  92  93  94  95  96  97  98  99
 2   2   2   3   2   2   2   2   2   2   2   2   2   2   2   2   2   2   2
 
100  101  102  103  104  105  106  107  108  109  110  111  112  113  114  115
  2    3    3    3    3    3    3    3    3    3    3    3    3    3    3    3
 
116  117  118  119  120  121  122  123  124  125  126  127  128  129  130  131
  3    3    3    3    3    3    3    3    3    3    3    3    3    3    3    3
 
132  133  134  135  136  137  138  139  140  141  142  143  144  145  146  147
  3    3    2    3    3    3    3    3    3    3    3    3    3    3    3    3
 
148  149  150
  3    3    3
 
                  Table
             1            2            3
1           50            0            0
2            0           48            2
3            0            1           49
 
                   Prob
               1            2            3
  1        1.000        0.000        0.000
  2        1.000        0.000        0.000
  3        1.000        0.000        0.000
  4        1.000        0.000        0.000
  5        1.000        0.000        0.000
  6        1.000        0.000        0.000
  7        1.000        0.000        0.000
  8        1.000        0.000        0.000
  9        1.000        0.000        0.000
 10        1.000        0.000        0.000
 11        1.000        0.000        0.000
 12        1.000        0.000        0.000
 13        1.000        0.000        0.000
 14        1.000        0.000        0.000
 15        1.000        0.000        0.000
 16        1.000        0.000        0.000
 17        1.000        0.000        0.000
 18        1.000        0.000        0.000
 19        1.000        0.000        0.000
 20        1.000        0.000        0.000
 21        1.000        0.000        0.000
 22        1.000        0.000        0.000
 23        1.000        0.000        0.000
 24        1.000        0.000        0.000
 25        1.000        0.000        0.000
 26        1.000        0.000        0.000
 27        1.000        0.000        0.000
 28        1.000        0.000        0.000
 29        1.000        0.000        0.000
 30        1.000        0.000        0.000
 31        1.000        0.000        0.000
 32        1.000        0.000        0.000
 33        1.000        0.000        0.000
 34        1.000        0.000        0.000
 35        1.000        0.000        0.000
 36        1.000        0.000        0.000
 37        1.000        0.000        0.000
 38        1.000        0.000        0.000
 39        1.000        0.000        0.000
 40        1.000        0.000        0.000
 41        1.000        0.000        0.000
 42        1.000        0.000        0.000
 43        1.000        0.000        0.000
 44        1.000        0.000        0.000
 45        1.000        0.000        0.000
 46        1.000        0.000        0.000
 47        1.000        0.000        0.000
 48        1.000        0.000        0.000
 49        1.000        0.000        0.000
 50        1.000        0.000        0.000
 51        0.000        1.000        0.000
 52        0.000        1.000        0.000
 53        0.000        0.998        0.002
 54        0.000        0.997        0.003
 55        0.000        0.997        0.003
 56        0.000        0.989        0.011
 57        0.000        0.995        0.005
 58        0.000        1.000        0.000
 59        0.000        1.000        0.000
 60        0.000        0.994        0.006
 61        0.000        1.000        0.000
 62        0.000        0.999        0.001
 63        0.000        1.000        0.000
 64        0.000        0.988        0.012
 65        0.000        1.000        0.000
 66        0.000        1.000        0.000
 67        0.000        0.973        0.027
 68        0.000        1.000        0.000
 69        0.000        0.813        0.187
 70        0.000        1.000        0.000
 71        0.000        0.336        0.664
 72        0.000        1.000        0.000
 73        0.000        0.699        0.301
 74        0.000        0.972        0.028
 75        0.000        1.000        0.000
 76        0.000        1.000        0.000
 77        0.000        0.998        0.002
 78        0.000        0.861        0.139
 79        0.000        0.992        0.008
 80        0.000        1.000        0.000
 81        0.000        1.000        0.000
 82        0.000        1.000        0.000
 83        0.000        1.000        0.000
 84        0.000        0.154        0.846
 85        0.000        0.943        0.057
 86        0.000        0.996        0.004
 87        0.000        0.999        0.001
 88        0.000        0.999        0.001
 89        0.000        1.000        0.000
 90        0.000        0.999        0.001
 91        0.000        0.981        0.019
 92        0.000        0.997        0.003
 93        0.000        1.000        0.000
 94        0.000        1.000        0.000
 95        0.000        0.999        0.001
 96        0.000        1.000        0.000
 97        0.000        1.000        0.000
 98        0.000        1.000        0.000
 99        0.000        1.000        0.000
100        0.000        1.000        0.000
101        0.000        0.000        1.000
102        0.000        0.000        1.000
103        0.000        0.000        1.000
104        0.000        0.006        0.994
105        0.000        0.000        1.000
106        0.000        0.000        1.000
107        0.000        0.004        0.996
108        0.000        0.000        1.000
109        0.000        0.000        1.000
110        0.000        0.000        1.000
111        0.000        0.006        0.994
112        0.000        0.001        0.999
113        0.000        0.000        1.000
114        0.000        0.000        1.000
115        0.000        0.000        1.000
116        0.000        0.000        1.000
117        0.000        0.033        0.967
118        0.000        0.000        1.000
119        0.000        0.000        1.000
120        0.000        0.041        0.959
121        0.000        0.000        1.000
122        0.000        0.000        1.000
123        0.000        0.000        1.000
124        0.000        0.028        0.972
125        0.000        0.001        0.999
126        0.000        0.007        0.993
127        0.000        0.057        0.943
128        0.000        0.151        0.849
129        0.000        0.000        1.000
130        0.000        0.020        0.980
131        0.000        0.000      
nrmiss =   0

  1.000
132        0.000        0.009        0.991
133        0.000        0.000        1.000
134        0.000        0.605        0.395
135        0.000        0.000        1.000
136        0.000        0.000        1.000
137        0.000        0.000        1.000
138        0.000        0.050        0.950
139        0.000        0.141        0.859
140        0.000        0.000        1.000
141        0.000        0.000        1.000
142        0.000        0.000        1.000
143        0.000        0.000        1.000
144        0.000        0.000        1.000
145        0.000        0.000        1.000
146        0.000        0.000        1.000
147        0.000        0.000        1.000
148        0.000        0.001        0.999
149        0.000        0.000        1.000
150        0.000        0.061        0.939
 
                   D2
             1            2            3
1          0.0        323.1        706.1
2        103.2          0.0         17.9
3        168.8         13.8          0.0
 
                     Covariance
             1            2            3            4
1       0.1242       0.0992       0.0164       0.0103
2       0.0992       0.1437       0.0117       0.0093
3       0.0164       0.0117       0.0302       0.0061
4       0.0103       0.0093       0.0061       0.0111

Warning Errors

IMSLS_BAD_OBS_1 In call #, row # of the data matrix, “x”, has group number = #. The group number must be an integer between 1.0 and “nGroups” = #, inclusively. This observation will be ignored.
IMSLS_BAD_OBS_2 The leaving out one method is specified but this observation does not have a valid group number (Its group number is #.). This observation (row #) is ignored.
IMSLS_BAD_OBS_3 The leaving out one method is specified but this observation does not have a valid weight or it does not have a valid frequency. This observation (row #) is ignored.
IMSLS_COV_SINGULAR_3 The group # covariance matrix is singular. “stats[1]” cannot be computed. “stats[1]” and “stats[3]” are set to the missing value code (NaN).

Fatal Errors

IMSLS_BAD_IDO_1 ido” = #. Initial allocations must be performed by making a call to discriminantAnalysis with “ido” = 1.
IMSLS_BAD_IDO_2 ido” = #. A new analysis may not begin until the previous analysis is terminated with “ido” equal to 5 or 6.
IMSLS_COV_SINGULAR_1 The variance-covariance matrix for population number # is singular. The computations cannot continue.
IMSLS_COV_SINGULAR_2 The pooled variance-covariance matrix is singular. The computations cannot continue.
IMSLS_COV_SINGULAR_4 A variance-covariance matrix is singular. The index of the first zero element is equal to #.