discriminantAnalysis¶
Performs a linear or a quadratic discriminant function analysis among several known groups.
Synopsis¶
discriminantAnalysis (nRows, nVariables, x, nGroups)
Required Arguments¶
- int
nRows
(Input) - Number of rows of
x
to be processed. - int
nVariables
(Input) - Number of variables to be used in the discrimination.
- float
x[[]]
(Input) - Array of size
nRows
bynVariables
+ 1 containing the data. The firstnVariables
columns correspond to the variables, and the last column (columnnVariables
) contains the group numbers. The groups must be numbered 1, 2, …,nGroups
. - int
nGroups
(Input) - Number of groups in the data.
Optional Arguments¶
xIndices
, intigrp
, intind[]
, intifrq
, intiwt
(Input)Each of the four arguments contains indices indicating column numbers of
x
in which particular types of data are stored.Parameter
igrp
contains the index for the column ofx
in which the group numbers are stored.Parameter
ind
contains the indices of the variables to be used in the analysis.Parameters
ifrq
andiwt
contain the column numbers ofx
in which the frequencies and weights, respectively, are stored. Setifrq
= −1 if there will be no column for frequencies. Setiwt
= −1 if there will be no column for weights. Weights are rounded to the nearest integer. Negative weights are not allowed.Defaults:
igrp
=nVariables
,ind[]
= 0, 1, …,nVariables
− 1,ifrq
= −1, andiwt
= −1method
, int (Input)Method of discrimination. The method chosen determines whether linear or quadratic discrimination is used, whether the group covariance matrices are computed (the pooled covariance matrix is always computed), and whether the leaving-out-one or the reclassification method is used to classify each observation.
method
discrimination method covariances computed classification method 1 linear pooled, group reclassification 2 quadratic pooled, group reclassification 3 linear pooled reclassification 4 linear pooled, group leaving-out-one 5 quadratic pooled, group leaving-out-one 6 linear pooled leaving-out-one In the leaving-out-one method of classification, the posterior probabilities are adjusted so as to eliminate the effect of the observation from the sample statistics prior to its classification. In the classification method, the effect of the observation is not eliminated from the classification function.
When optional argument
ido
is specified, the following rules for mixing methods apply; Methods 1, 2, 4, and 5 can be intermixed, as can methods 3 and 6. Methods 1, 2, 4, and 5 cannot be intermixed with methods 3 and 6.Default:
method
= 1ido
, int (Input)Processing option. See Comments 3 and 4 for more information.
ido
Action 0 This is the only invocation; all the data are input at once. (Default) 1 This is the first invocation with this data; additional calls will be made. Initialization and updating for the nRows
observations ofx
will be performed.2 This is an intermediate invocation; updating for the nRows
observations ofx
will be performed.3 All statistics are updated for the nRows
observations. The discriminant functions and other statistics are computed.4 The discriminant functions are used to classify each of the nRows
observations ofx
.5 The covariance matrices are computed, and workspace is released. No further call to discriminantAnalysis
withido
greater than 1 should be made without first callingdiscriminantAnalysis
withido
= 1.6 Workspace is released. No further calls to discriminantAnalysis
withido
greater than 1 should be made without first callingdiscriminantAnalysis
withido
= 1. Invocation with this option is not required if a call has already been made withido
= 5.Default:
ido
= 0
rowsAdd
(Input)
or
rowsDelete
(Input)By default (or if
rowsAdd
is specified), then the observations inx
are added to the discriminant statistics. IfrowsDelete
is specified, then the observations are deleted.If
ido
= 0, these optional arguments are ignored (data is always added if there is only one invocation).
priorEqual
(Input)
or
priorProportional
(Input)
or
priorInput
, float[]
(Input)By default, (or if
priorEqual
is specified), equal prior probabilities are calculated as 1.0/nGroups
.If
priorProportional
is specified, prior probabilities are calculated to be proportional to the sample size in each group.If
priorInput
is specified, then arraypriorInput
is an array of lengthnGroups
containing the prior probabilities for each group, such that the sum of all prior probabilities is equal to 1.0. Prior probabilities are not used ifido
is equal to 1, 2, 5, or 6.priorOutput
(Output)- An array of length
nGroups
containing the most recently calculated or input prior probabilities. IfpriorProportional
is specified, every element ofpriorOutput
is equal to −1 until a call is made withido
equal to 0 or 3, at which point the priors are calculated. Note that subsequent calls todiscriminantAnalysis
withpriorProportional
specified, andido
not equal to 0 or 3 will result in the elements ofpriorOutput
being reset to −1. groupCounts
(Output)- An integer array of length
nGroups
containing the number of observations in each group. ArraygroupCounts
is updated whenido
is equal to 0, 1, or 2. means
(Output)An array of size
nGroups
bynVariables
. The i-th row of means contains the group i variable means. Array means is updated whenido
is equal to 0, 1, 2, or 5. The means are unscaled until a call is made withido
= 5, where the unscaled means are calculated as \(\Sigma w_i f_i x_i\) and the scaled means as\[\frac{\Sigma w_i f_i x_i}{\Sigma w_i f_i}\]where \(x_i\) is the value of the i-th observation, \(w_i\) is the weight of the i-th observation, and \(f_i\) is the frequency of the i-th observation.
cov
(Output)- An array of size g by
nVariables
bynVariables
containing the within-group covariance matrices (methods
1, 2, 4, and 5 only) as the first g-1 matrices, and the pooled covariance matrix as the g-th matrix (that is, the firstnVariables
×nVariables
elements comprise the group 1 covariance matrix, the nextnVariables
×nVariables
elements comprise the group 2 covariance, …, and the lastnVariables
×nVariables
elements comprise the pooled covariance matrix). Ifmethod
is 3 or 6 then g is equal to 1. Otherwise, g is equal tonGroups
+ 1. Argumentcov
is updated whenido
is equal to 0, 1, 2, 3, or 5. coef
(Output)An array of size
nGroups
by (nVariables
+ 1) containing the linear discriminant coefficients. The first column ofcoefficients
contains the constant term, and the remaining columns contain the variable coefficients. Row i − 1 ofcoefficients
corresponds to group i, for i = 1, 2, …nGroups
. Arraycoefficients
is always computed as the linear discriminant function coefficients even when quadratic discrimination is specified. Specifically, given the linear discriminant function\[z_i = \ln\left(p_i\right) - 0.5 \overline{x}_i^T S_p^{-1} \overline{x}_i + x^T S_p^{-1} \overline{x}_i\]the intercept \(\ln\left(p_i\right)-0.5\overline{x}_i^TS_p^{-1}\overline{x}_i\) is assigned to
coefficients[i
×(nVariables+1)]
and thej
‑th element of \(S_p^{-1}\overline{x}_i\) is assigned tocoefficients[i
×(nVariables+1)+j]
, where i = 0, …,nGroups
-1
and j=1, …,nVariables+1
. Arraycoefficients
is updated whenido
is equal to 0 or 3.classMembership
(Output)An integer array of length
nRows
containing the group to which the observation was classified. ArrayclassMembership
is updated whenido
is equal to 0 or 4.If an observation has an invalid group number, frequency, or weight when the leaving-out-one method has been specified, then the observation is not classified and the corresponding elements of
classMembership
(andprob
, seeprob
) are set to zero.classTable
(Output)- An array of size
nGroups
bynGroups
containing the classification table. ArrayclassTable
is updated whenido
is equal to 0, 1, or 4. Each observation that is classified and has a group number 1.0, 2.0, …,nGroups
is entered into the table. The rows of the table correspond to the known group membership. The columns refer to the group to which the observation was classified. Classification results accumulate with each call todiscriminantAnalysis
withido
equal to 4. For example, if two calls withido
equal to 4 are made, the elements inclassTable
sum to the total number of valid observations in the two calls. prob
(Output)- An array of size
nRows
bynGroups
containing the posterior probabilities for each observation. Argumentprob
is updated whenido
is equal to 0 or 4. mahalanobis
(Output)An array of size
nGroups
bynGroups
containing the Mahalanobis distances\[D_{ij}^2\]between the group means. Argument
mahalanobis
is updated whenido
is equal to 0 or 3.For linear discrimination, the Mahalanobis distance is computed using the pooled covariance matrix. Otherwise, the Mahalanobis distance
\[D_{ij}^2\]between group means i and j is computed using the within covariance matrix for group i in place of the pooled covariance matrix.
stats
(Output)- An array of length 4 + 2 × (
nGroups
+ 1) containing various statistics of interest. Arraystats
is updated whenido
is equal to 0, 2, 3, or 5. The first element ofstats
is the sum of the degrees of freedom for the within-covariance matrices. The second, third, and fourth elements ofstats
correspond to the chi-squared statistic, its degrees of freedom, and the probability of a greater chi-squared, respectively, of a test of the homogeneity of the within-covariance matrices (not computed ifmethod
is equal to 3 or 6). The fifth through 5 +nGroups
elements ofstats
contain the log of the determinants of each group’s covariance matrix (not computed ifmethod
is equal to 3 or 6) and of the pooled covariance matrix (element 4 +nGroups
). Finally, the lastnGroups
+ 1 elements ofstats
contain the sum of the weights within each group, and in the last position, the sum of the weights in all groups. nRowsMissing
(Output)Number of rows of data encountered in calls to
discriminantAnalysis
containing missing values (NaN) for the classification, group, weight, and/or frequency variables. If a row of data contains a missing value (NaN) for any of these variables, that row is excluded from the computations.Array
nRowsMissing
is updated whenido
is equal to 0, 1, 2, or 3.
Comments¶
Common choices for the Bayesian prior probabilities are given by:
priorInput
[i
] = 1.0∕nGroups
(equal priors)priorInput
[i
] =groupCounts[i]
∕nRows
(proportional priors)priorInput
[i
] = Past history or subjective judgment.In all cases, the priors should sum to 1.0.
Two passes of the data are made. In the first pass, the statistics required to compute the discriminant functions are obtained (
ido
equal to 1, 2, and 3). In the second pass, the discriminant functions are used to classify the observations. Whenido
is equal to 0, all of the data are memory resident, and both passes are made in one call todiscriminantAnalysis
. Whenido
> 0 (optional argumentido
is specified), a third call todiscriminantAnalysis
involving no data is required withido
equal to 5 or 6.Here are a few rules and guidelines for the correct value of
ido
in a series of calls:Calls with
ido
= 0 orido
= 1 may be made at any time, subject to rule 2. These calls indicate that a new analysis is to begin, and therefore allocate memory and destroy all statistics from previous calls.Each series of calls to
discriminantAnalysis
which begins withido
= 1 must end withido
equal to 5 or 6 to ensure the proper release of workspace, subject to rule 3.ido
may not be 4 or 5 before a call withido
= 3 has been made.ido
may not be 2, 3, 4, 5, or 6- Immediately after a call with
ido
= 0. - Before a call with
ido
= 1 has been made.
iii) Immediately after a call with
ido
equal to 5 or 6 has been made.The following is a valid sequence of
ido
’s:- Immediately after a call with
ido |
Explanation |
---|---|
0 | Data Set A: Perform a complete analysis. All data to be
used in the analysis must be present in x . Since
cleanup of workspace is automatic for ido = 0, no
further calls are necessary. |
1 | Data Set B: Begin analysis. The nRows observations in
x are used for initialization. |
2 | Data Set B: Continue analysis. New observations placed in
x are added to (or deleted from, see rowsDelete )
the analysis. |
2 | Data Set B: Continue analysis. nRows new observations
placed in x are added to (or deleted from, see
rowsDelete ) the analysis. |
3 | Data Set B: Continue analysis. nRows new observations
are added (or deleted) and discriminant functions and other
statistics are computed. |
4 | Data Set B: Classification of each of the nRows
observations in the current x matrix. |
5 | Data Set B: End analysis. Covariance matrices are computed
and workspace is released. This analysis could also have
been ended by choosing ido = 6 |
1 | Data Set C: Begin analysis. Note that for this call to be
valid the previous call must have been made with ido
equal to 5 or 6. |
3 | Data Set C: Continue analysis. |
4 | Data Set C: Continue analysis. |
3 | Data Set C: Continue analysis. |
6 | Data Set C: End analysis. |
- Because of the internal workspace allocation and saved variables,
function
discriminantAnalysis
must complete the analysis of a data set before beginning processing of the next data set.
Description¶
Function discriminantAnalysis
performs discriminant function analysis
using either linear or quadratic discrimination. The output includes a
measure of distance between the groups, a table summarizing the
classification results, a matrix containing the posterior probabilities of
group membership for each observation, and the within-sample means and
covariance matrices. The linear discriminant function coefficients are also
computed.
By default (or if optional argument ido
is specified with ido
= 0)
all observations are input during one call, a method of operation that has
the advantage of simplicity. Alternatively, one or more rows of observations
can be input during separate calls. This method does not require that all
observations be memory resident, a significant advantage with large data
sets. Note, however, that the algorithm requires two passes of the data.
During the first pass the discriminant functions are computed while in the
second pass, the observations are classified. Thus, with the second method
of operation, the data will usually need to be input twice.
Because both methods result in the same operations being performed, the
algorithm is discussed as if only a few observations are input during each
call. The operations performed during each call depend upon the ido
parameter.
The ido
= 1 step is the initialization step. “Private” internally
allocated saved variables corresponding to means
, classTable
, and
covariances
are initialized to zero, and other program parameters are
set (copies of these private variables are written to the corresponding
output variables upon return from the function call, assuming ido
values
such that the results are to be returned). Parameters nRows
, x
, and
method
can be changed from one call to the next within the two sets
{1, 2, 4, 5} and {3, 6} but not between these sets when ido
> 1. That
is, do not specify method
= 1 in one call and method
= 3 in another
call without first making a call with ido
= 1.
After initialization has been performed in the ido
= 1 step, the
within-group means are updated for all valid observations in x
.
Observations with invalid group numbers are ignored, as are observation with
missing values. The LU factorization of the covariance matrices are
updated by adding (or deleting) observations via Givens rotations.
The ido
= 2 step is used solely for adding or deleting observations from
the model as in the above paragraph.
The ido
= 3 step begins by adding all observations in x
to the means
and the factorizations of the covariance matrices. It continues by computing
some statistics of interest: the linear discriminant functions, the prior
probabilities (by default, or if proportionalPriors
is specified), the
log of the determinant of each of the covariance matrices, a test statistic
for testing that all of the within-group covariance matrices are equal, and
a matrix of Mahalanobis distances between the groups. The matrix of
Mahalanobis distances is computed via the pooled covariance matrix when
linear discrimination is specified; the row covariance matrix is used when
the discrimination is quadratic.
Covariance matrices are defined as follows: Let \(N_i\) denote the sum of the frequencies of the observations in group i and \(M_i\) denote the number of observations in group i. Then, if \(S_i\) denotes the within-group i covariance matrix,
Where \(w_j\) is the weight of the j-th observation in group i, \(f_j\) is the frequency, \(x_j\) is the j-th observation column vector (in group i), and \(\bar{x}\) denotes the mean vector of the observations in group i. The mean vectors are computed as
Given the means and the covariance matrices, the linear discriminant function for group i is computed as:
where \(\ln(p_i)\) is the natural log of the prior probability for the i-th group, x is the observation to be classified, and \(S_p\) denoted the pooled covariance matrix.
Let S denote either the pooled covariance matrix of one of the within-group covariance matrices \(S_i\). (S will be the pooled covariance matrix in linear discrimination, and \(S_i\) otherwise.) The Mahalanobis distance between group i and group j is computed as:
Finally, the asymptotic chi-squared test for the equality of covariance matrices is computed as follows (Morrison 1976, p. 252):
where \(n_i\) is the number of degrees of freedom in the i-th sample covariance matrix, k is the number of groups, and
where p is the number of variables.
When ido
= 4, the estimated posterior probability of each observation
x belonging to group i is computed using the prior probabilities and the
sample mean vectors and estimated covariance matrices under a multivariate
normal assumption. Under quadratic discrimination, the within-group
covariance matrices are used to compute the estimated posterior
probabilities. The estimated posterior probability of an observation x
belonging to group i is
where
For the leaving-out-one method of classification (method
equal to 4, 5
or 6), the sample mean vector and sample covariance matrices in the formula
for
are adjusted so as to remove the observation x from their computation. For
linear discrimination (method
equal to 1, 3, 4, or 6), the linear
discriminant function coefficients are actually used to compute the same
posterior probabilities.
Using the posterior probabilities, each observation in x is classified
into a group; the result is tabulated in the matrix classTable
and saved
in the vector classMembership
. Matrix classTable
is not altered at
this stage if x
[i
][igrp
] (see optional argument xIndices
)
contains a group number that is out of range. If the reclassification method
is specified, then all observations with no missing values in the
nVariables
classification variables are classified. When the
leaving-out-one method is used, observations with invalid group numbers,
weights, frequencies, or classification variables are not classified.
Regardless of the frequency, a 1 is added (or subtracted) from
classTable
for each row of x
that is classified and contains a valid
group number.
When method
> 3, adjustment is made to the posterior probabilities to
remove the effect of the observation in the classification rule. In this
adjustment, each observation is presumed to have a weight of
x
[i
][iwt
] if iwt
> −1 (and a weight of 1.0 if iwt
=
−1), and a frequency of 1.0. See Lachenbruch (1975, p. 36) for the required
adjustment.
Finally, when ido
= 5, the covariance matrices are computed from their
LU factorizations. Internally allocated and saved variables are cleaned up
at this step (ido
equal to 5 or 6).
Examples¶
Example 1¶
The following example uses liner discrimination with equal prior
probabilities on Fisher’s (1936) Iris data. This example illustrates the
execution of discriminantAnalysis
when one call is made (i.e. using the
default of ido
= 0).
from __future__ import print_function
from numpy import *
from pyimsl.stat.discriminantAnalysis import discriminantAnalysis
from pyimsl.stat.writeMatrix import writeMatrix
n_groups = 3
nvar = 4
prior_out = []
means = []
cov = []
coef = []
nrmiss = []
table = []
d2 = []
stats = []
prob = []
counts = []
cm = []
x = array([
[5.1, 3.5, 1.4, 0.2, 1],
[4.9, 3., 1.4, 0.2, 1],
[4.7, 3.2, 1.3, 0.2, 1],
[4.6, 3.1, 1.5, 0.2, 1],
[5., 3.6, 1.4, 0.2, 1],
[5.4, 3.9, 1.7, 0.4, 1],
[4.6, 3.4, 1.4, 0.3, 1],
[5., 3.4, 1.5, 0.2, 1],
[4.4, 2.9, 1.4, 0.2, 1],
[4.9, 3.1, 1.5, 0.1, 1],
[5.4, 3.7, 1.5, 0.2, 1],
[4.8, 3.4, 1.6, 0.2, 1],
[4.8, 3., 1.4, 0.1, 1],
[4.3, 3., 1.1, 0.1, 1],
[5.8, 4., 1.2, 0.2, 1],
[5.7, 4.4, 1.5, 0.4, 1],
[5.4, 3.9, 1.3, 0.4, 1],
[5.1, 3.5, 1.4, 0.3, 1],
[5.7, 3.8, 1.7, 0.3, 1],
[5.1, 3.8, 1.5, 0.3, 1],
[5.4, 3.4, 1.7, 0.2, 1],
[5.1, 3.7, 1.5, 0.4, 1],
[4.6, 3.6, 1., 0.2, 1],
[5.1, 3.3, 1.7, 0.5, 1],
[4.8, 3.4, 1.9, 0.2, 1],
[5., 3., 1.6, 0.2, 1],
[5., 3.4, 1.6, 0.4, 1],
[5.2, 3.5, 1.5, 0.2, 1],
[5.2, 3.4, 1.4, 0.2, 1],
[4.7, 3.2, 1.6, 0.2, 1],
[4.8, 3.1, 1.6, 0.2, 1],
[5.4, 3.4, 1.5, 0.4, 1],
[5.2, 4.1, 1.5, 0.1, 1],
[5.5, 4.2, 1.4, 0.2, 1],
[4.9, 3.1, 1.5, 0.2, 1],
[5., 3.2, 1.2, 0.2, 1],
[5.5, 3.5, 1.3, 0.2, 1],
[4.9, 3.6, 1.4, 0.1, 1],
[4.4, 3., 1.3, 0.2, 1],
[5.1, 3.4, 1.5, 0.2, 1],
[5., 3.5, 1.3, 0.3, 1],
[4.5, 2.3, 1.3, 0.3, 1],
[4.4, 3.2, 1.3, 0.2, 1],
[5., 3.5, 1.6, 0.6, 1],
[5.1, 3.8, 1.9, 0.4, 1],
[4.8, 3., 1.4, 0.3, 1],
[5.1, 3.8, 1.6, 0.2, 1],
[4.6, 3.2, 1.4, 0.2, 1],
[5.3, 3.7, 1.5, 0.2, 1],
[5., 3.3, 1.4, 0.2, 1],
[7., 3.2, 4.7, 1.4, 2],
[6.4, 3.2, 4.5, 1.5, 2],
[6.9, 3.1, 4.9, 1.5, 2],
[5.5, 2.3, 4., 1.3, 2],
[6.5, 2.8, 4.6, 1.5, 2],
[5.7, 2.8, 4.5, 1.3, 2],
[6.3, 3.3, 4.7, 1.6, 2],
[4.9, 2.4, 3.3, 1., 2],
[6.6, 2.9, 4.6, 1.3, 2],
[5.2, 2.7, 3.9, 1.4, 2],
[5., 2., 3.5, 1., 2],
[5.9, 3., 4.2, 1.5, 2],
[6., 2.2, 4., 1., 2],
[6.1, 2.9, 4.7, 1.4, 2],
[5.6, 2.9, 3.6, 1.3, 2],
[6.7, 3.1, 4.4, 1.4, 2],
[5.6, 3., 4.5, 1.5, 2],
[5.8, 2.7, 4.1, 1., 2],
[6.2, 2.2, 4.5, 1.5, 2],
[5.6, 2.5, 3.9, 1.1, 2],
[5.9, 3.2, 4.8, 1.8, 2],
[6.1, 2.8, 4., 1.3, 2],
[6.3, 2.5, 4.9, 1.5, 2],
[6.1, 2.8, 4.7, 1.2, 2],
[6.4, 2.9, 4.3, 1.3, 2],
[6.6, 3., 4.4, 1.4, 2],
[6.8, 2.8, 4.8, 1.4, 2],
[6.7, 3., 5., 1.7, 2],
[6., 2.9, 4.5, 1.5, 2],
[5.7, 2.6, 3.5, 1., 2],
[5.5, 2.4, 3.8, 1.1, 2],
[5.5, 2.4, 3.7, 1., 2],
[5.8, 2.7, 3.9, 1.2, 2],
[6., 2.7, 5.1, 1.6, 2],
[5.4, 3., 4.5, 1.5, 2],
[6., 3.4, 4.5, 1.6, 2],
[6.7, 3.1, 4.7, 1.5, 2],
[6.3, 2.3, 4.4, 1.3, 2],
[5.6, 3., 4.1, 1.3, 2],
[5.5, 2.5, 4., 1.3, 2],
[5.5, 2.6, 4.4, 1.2, 2],
[6.1, 3., 4.6, 1.4, 2],
[5.8, 2.6, 4., 1.2, 2],
[5., 2.3, 3.3, 1., 2],
[5.6, 2.7, 4.2, 1.3, 2],
[5.7, 3., 4.2, 1.2, 2],
[5.7, 2.9, 4.2, 1.3, 2],
[6.2, 2.9, 4.3, 1.3, 2],
[5.1, 2.5, 3., 1.1, 2],
[5.7, 2.8, 4.1, 1.3, 2],
[6.3, 3.3, 6., 2.5, 3],
[5.8, 2.7, 5.1, 1.9, 3],
[7.1, 3., 5.9, 2.1, 3],
[6.3, 2.9, 5.6, 1.8, 3],
[6.5, 3., 5.8, 2.2, 3],
[7.6, 3., 6.6, 2.1, 3],
[4.9, 2.5, 4.5, 1.7, 3],
[7.3, 2.9, 6.3, 1.8, 3],
[6.7, 2.5, 5.8, 1.8, 3],
[7.2, 3.6, 6.1, 2.5, 3],
[6.5, 3.2, 5.1, 2., 3],
[6.4, 2.7, 5.3, 1.9, 3],
[6.8, 3., 5.5, 2.1, 3],
[5.7, 2.5, 5., 2., 3],
[5.8, 2.8, 5.1, 2.4, 3],
[6.4, 3.2, 5.3, 2.3, 3],
[6.5, 3., 5.5, 1.8, 3],
[7.7, 3.8, 6.7, 2.2, 3],
[7.7, 2.6, 6.9, 2.3, 3],
[6., 2.2, 5., 1.5, 3],
[6.9, 3.2, 5.7, 2.3, 3],
[5.6, 2.8, 4.9, 2., 3],
[7.7, 2.8, 6.7, 2., 3],
[6.3, 2.7, 4.9, 1.8, 3],
[6.7, 3.3, 5.7, 2.1, 3],
[7.2, 3.2, 6., 1.8, 3],
[6.2, 2.8, 4.8, 1.8, 3],
[6.1, 3., 4.9, 1.8, 3],
[6.4, 2.8, 5.6, 2.1, 3],
[7.2, 3., 5.8, 1.6, 3],
[7.4, 2.8, 6.1, 1.9, 3],
[7.9, 3.8, 6.4, 2., 3],
[6.4, 2.8, 5.6, 2.2, 3],
[6.3, 2.8, 5.1, 1.5, 3],
[6.1, 2.6, 5.6, 1.4, 3],
[7.7, 3., 6.1, 2.3, 3],
[6.3, 3.4, 5.6, 2.4, 3],
[6.4, 3.1, 5.5, 1.8, 3],
[6., 3., 4.8, 1.8, 3],
[6.9, 3.1, 5.4, 2.1, 3],
[6.7, 3.1, 5.6, 2.4, 3],
[6.9, 3.1, 5.1, 2.3, 3],
[5.8, 2.7, 5.1, 1.9, 3],
[6.8, 3.2, 5.9, 2.3, 3],
[6.7, 3.3, 5.7, 2.5, 3],
[6.7, 3., 5.2, 2.3, 3],
[6.3, 2.5, 5., 1.9, 3],
[6.5, 3., 5.2, 2., 3],
[6.2, 3.4, 5.4, 2.3, 3],
[5.9, 3., 5.1, 1.8, 3]])
nrow = 150
discriminantAnalysis(nrow, nvar, x, n_groups,
method=3,
groupCounts=counts,
coef=coef,
means=means,
stats=stats,
classMembership=cm,
classTable=table,
prob=prob,
mahalanobis=d2,
cov=cov,
priorOutput=prior_out,
nRowsMissing=nrmiss,
priorEqual=True)
writeMatrix('counts', counts)
writeMatrix('coef', coef)
writeMatrix('Means', means)
writeMatrix('Stats', stats, column=True)
writeMatrix('Membership', cm, writeFormat="%2i")
writeMatrix('Table', table)
writeMatrix('Prob', prob)
writeMatrix('D2', d2)
writeMatrix('Covariance', cov)
writeMatrix('Prior OUT', prior_out)
print("\nnrmiss = %3i" % nrmiss[0])
Output¶
counts
1 2 3
50 50 50
coef
1 2 3 4 5
1 -86.3 23.5 23.6 -16.4 -17.4
2 -72.9 15.7 7.1 5.2 6.4
3 -104.4 12.4 3.7 12.8 21.1
Means
1 2 3 4
1 5.006 3.428 1.462 0.246
2 5.936 2.770 4.260 1.326
3 6.588 2.974 5.552 2.026
Stats
1 147
2 ...........
3 ...........
4 ...........
5 ...........
6 ...........
7 ...........
8 -10
9 50
10 50
11 50
12 150
Membership
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20
1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1
21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40
1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1
41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60
1 1 1 1 1 1 1 1 1 1 2 2 2 2 2 2 2 2 2 2
61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80
2 2 2 2 2 2 2 2 2 2 3 2 2 2 2 2 2 2 2 2
81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99
2 2 2 3 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2
100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115
2 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3
116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131
3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3
132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147
3 3 2 3 3 3 3 3 3 3 3 3 3 3 3 3
148 149 150
3 3 3
Table
1 2 3
1 50 0 0
2 0 48 2
3 0 1 49
Prob
1 2 3
1 1.000 0.000 0.000
2 1.000 0.000 0.000
3 1.000 0.000 0.000
4 1.000 0.000 0.000
5 1.000 0.000 0.000
6 1.000 0.000 0.000
7 1.000 0.000 0.000
8 1.000 0.000 0.000
9 1.000 0.000 0.000
10 1.000 0.000 0.000
11 1.000 0.000 0.000
12 1.000 0.000 0.000
13 1.000 0.000 0.000
14 1.000 0.000 0.000
15 1.000 0.000 0.000
16 1.000 0.000 0.000
17 1.000 0.000 0.000
18 1.000 0.000 0.000
19 1.000 0.000 0.000
20 1.000 0.000 0.000
21 1.000 0.000 0.000
22 1.000 0.000 0.000
23 1.000 0.000 0.000
24 1.000 0.000 0.000
25 1.000 0.000 0.000
26 1.000 0.000 0.000
27 1.000 0.000 0.000
28 1.000 0.000 0.000
29 1.000 0.000 0.000
30 1.000 0.000 0.000
31 1.000 0.000 0.000
32 1.000 0.000 0.000
33 1.000 0.000 0.000
34 1.000 0.000 0.000
35 1.000 0.000 0.000
36 1.000 0.000 0.000
37 1.000 0.000 0.000
38 1.000 0.000 0.000
39 1.000 0.000 0.000
40 1.000 0.000 0.000
41 1.000 0.000 0.000
42 1.000 0.000 0.000
43 1.000 0.000 0.000
44 1.000 0.000 0.000
45 1.000 0.000 0.000
46 1.000 0.000 0.000
47 1.000 0.000 0.000
48 1.000 0.000 0.000
49 1.000 0.000 0.000
50 1.000 0.000 0.000
51 0.000 1.000 0.000
52 0.000 0.999 0.001
53 0.000 0.996 0.004
54 0.000 1.000 0.000
55 0.000 0.996 0.004
56 0.000 0.999 0.001
57 0.000 0.986 0.014
58 0.000 1.000 0.000
59 0.000 1.000 0.000
60 0.000 1.000 0.000
61 0.000 1.000 0.000
62 0.000 0.999 0.001
63 0.000 1.000 0.000
64 0.000 0.994 0.006
65 0.000 1.000 0.000
66 0.000 1.000 0.000
67 0.000 0.981 0.019
68 0.000 1.000 0.000
69 0.000 0.960 0.040
70 0.000 1.000 0.000
71 0.000 0.253 0.747
72 0.000 1.000 0.000
73 0.000 0.816 0.184
74 0.000 1.000 0.000
75 0.000 1.000 0.000
76 0.000 1.000 0.000
77 0.000 0.998 0.002
78 0.000 0.689 0.311
79 0.000 0.993 0.007
80 0.000 1.000 0.000
81 0.000 1.000 0.000
82 0.000 1.000 0.000
83 0.000 1.000 0.000
84 0.000 0.143 0.857
85 0.000 0.964 0.036
86 0.000 0.994 0.006
87 0.000 0.998 0.002
88 0.000 0.999 0.001
89 0.000 1.000 0.000
90 0.000 1.000 0.000
91 0.000 0.999 0.001
92 0.000 0.998 0.002
93 0.000 1.000 0.000
94 0.000 1.000 0.000
95 0.000 1.000 0.000
96 0.000 1.000 0.000
97 0.000 1.000 0.000
98 0.000 1.000 0.000
99 0.000 1.000 0.000
100 0.000 1.000 0.000
101 0.000 0.000 1.000
102 0.000 0.001 0.999
103 0.000 0.000 1.000
104 0.000 0.001 0.999
105 0.000 0.000 1.000
106 0.000 0.000 1.000
107 0.000 0.049 0.951
108 0.000 0.000 1.000
109 0.000 0.000 1.000
110 0.000 0.000 1.000
111 0.000 0.013 0.987
112 0.000 0.002 0.998
113 0.000 0.000 1.000
114 0.000 0.000 1.000
115 0.000 0.000 1.000
116 0.000 0.000 1.000
117 0.000 0.006 0.994
118 0.000 0.000 1.000
119 0.000 0.000 1.000
120 0.000 0.221 0.779
121 0.000 0.000 1.000
122 0.000 0.001 0.999
123 0.000 0.000 1.000
124 0.000 0.097 0.903
125 0.000 0.000 1.000
126 0.000 0.003 0.997
127 0.000 0.188 0.812
128 0.000 0.134 0.866
129 0.000 0.000 1.000
130 0.000 0.104 0.896
131 0.000 0.000 1.000
132 0.000 0.001 0.999
133 0.000 0.000 1.000
134
nrmiss = 0
0.000 0.729 0.271
135 0.000 0.066 0.934
136 0.000 0.000 1.000
137 0.000 0.000 1.000
138 0.000 0.006 0.994
139 0.000 0.193 0.807
140 0.000 0.001 0.999
141 0.000 0.000 1.000
142 0.000 0.000 1.000
143 0.000 0.001 0.999
144 0.000 0.000 1.000
145 0.000 0.000 1.000
146 0.000 0.000 1.000
147 0.000 0.006 0.994
148 0.000 0.003 0.997
149 0.000 0.000 1.000
150 0.000 0.018 0.982
D2
1 2 3
1 0.0 89.9 179.4
2 89.9 0.0 17.2
3 179.4 17.2 0.0
Covariance
1 2 3 4
1 0.2650 0.0927 0.1675 0.0384
2 0.0927 0.1154 0.0552 0.0327
3 0.1675 0.0552 0.1852 0.0427
4 0.0384 0.0327 0.0427 0.0419
Prior OUT
1 2 3
0.3333 0.3333 0.3333
Example 2¶
Continuing with Fisher’s Iris data, the example below computes the quadratic
discriminant functions using values of ido
greater than 0. In the first
loop, all observations are added to the functions, one at a time. In the
second loop, each of the observations is classified, one by one, using the
leaving-out-one method.
from __future__ import print_function
from numpy import *
from pyimsl.stat.dataSets import dataSets
from pyimsl.stat.discriminantAnalysis import discriminantAnalysis
from pyimsl.stat.permuteMatrix import permuteMatrix, PERMUTE_COLUMNS
from pyimsl.stat.writeMatrix import writeMatrix
perm = [1, 2, 3, 4, 0]
nrow = []
ncol = []
n_groups = 3
# Retrieve the Fisher Iris Data Set
xtemp = dataSets(3, nObservations=nrow, nVariables=ncol)
nrow = nrow[0]
ncol = ncol[0]
nvar = ncol - 1
# Move the group column to end of the the matrix
x = permuteMatrix(xtemp, perm, PERMUTE_COLUMNS)
# Initialize Analysis
discriminantAnalysis(0, nvar, x, n_groups, ido=1, method=2)
# Add In Each Observation
for i in range(0, nrow):
discriminantAnalysis(1, nvar, x[i:i + 1, :], n_groups, ido=2)
# Remove observation 0 from the analysis
discriminantAnalysis(1, nvar, x, n_groups, rowsDelete=True, ido=2)
# Add observation 0 back into the analysis
discriminantAnalysis(1, nvar, x, n_groups, ido=2)
# Compute statistics
prior_out = []
discriminantAnalysis(0, nvar, x, n_groups, priorProportional=True,
priorOutput=prior_out, ido=3)
writeMatrix("Prior OUT", prior_out)
# Classify One observation at a time, using proportional priors
cm = []
prob = []
for i in range(0, nrow):
cm_temp = []
prob_temp = []
discriminantAnalysis(1, nvar, x[i:i + 1, :], n_groups, ido=4,
classMembership=cm_temp, prob=prob_temp)
prob.append(prob_temp[0])
cm.append(cm_temp[0])
# Compute covariance matrices and release internal workspace
cov = []
group_counts = []
coef = []
means = []
stats = []
table = []
d2 = []
nrmiss = []
discriminantAnalysis(0, nvar, x, n_groups, ido=5, cov=cov,
groupCounts=group_counts, coef=coef,
means=means, stats=stats, classTable=table,
mahalanobis=d2, nRowsMissing=nrmiss)
writeMatrix("Counts", group_counts)
writeMatrix("Coef", coef)
writeMatrix("Means", means)
writeMatrix("Stats", stats, column=True)
writeMatrix("Membership", cm, writeFormat="%2i")
writeMatrix("Table", table)
writeMatrix("Prob", prob)
writeMatrix("D2", d2)
writeMatrix("Covariance", cov[0])
print("\nnrmiss = %3d\n" % nrmiss[0])
Output¶
Prior OUT
1 2 3
0.3333 0.3333 0.3333
Counts
1 2 3
50 50 50
Coef
1 2 3 4 5
1 -86.3 23.5 23.6 -16.4 -17.4
2 -72.9 15.7 7.1 5.2 6.4
3 -104.4 12.4 3.7 12.8 21.1
Means
1 2 3 4
1 5.006 3.428 1.462 0.246
2 5.936 2.770 4.260 1.326
3 6.588 2.974 5.552 2.026
Stats
1 147.0
2 143.8
3 20.0
4 0.0
5 -13.1
6 -10.9
7 -8.9
8 -10.0
9 50.0
10 50.0
11 50.0
12 150.0
Membership
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20
1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1
21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40
1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1
41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60
1 1 1 1 1 1 1 1 1 1 2 2 2 2 2 2 2 2 2 2
61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80
2 2 2 2 2 2 2 2 2 2 3 2 2 2 2 2 2 2 2 2
81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99
2 2 2 3 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2
100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115
2 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3
116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131
3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3
132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147
3 3 2 3 3 3 3 3 3 3 3 3 3 3 3 3
148 149 150
3 3 3
Table
1 2 3
1 50 0 0
2 0 48 2
3 0 1 49
Prob
1 2 3
1 1.000 0.000 0.000
2 1.000 0.000 0.000
3 1.000 0.000 0.000
4 1.000 0.000 0.000
5 1.000 0.000 0.000
6 1.000 0.000 0.000
7 1.000 0.000 0.000
8 1.000 0.000 0.000
9 1.000 0.000 0.000
10 1.000 0.000 0.000
11 1.000 0.000 0.000
12 1.000 0.000 0.000
13 1.000 0.000 0.000
14 1.000 0.000 0.000
15 1.000 0.000 0.000
16 1.000 0.000 0.000
17 1.000 0.000 0.000
18 1.000 0.000 0.000
19 1.000 0.000 0.000
20 1.000 0.000 0.000
21 1.000 0.000 0.000
22 1.000 0.000 0.000
23 1.000 0.000 0.000
24 1.000 0.000 0.000
25 1.000 0.000 0.000
26 1.000 0.000 0.000
27 1.000 0.000 0.000
28 1.000 0.000 0.000
29 1.000 0.000 0.000
30 1.000 0.000 0.000
31 1.000 0.000 0.000
32 1.000 0.000 0.000
33 1.000 0.000 0.000
34 1.000 0.000 0.000
35 1.000 0.000 0.000
36 1.000 0.000 0.000
37 1.000 0.000 0.000
38 1.000 0.000 0.000
39 1.000 0.000 0.000
40 1.000 0.000 0.000
41 1.000 0.000 0.000
42 1.000 0.000 0.000
43 1.000 0.000 0.000
44 1.000 0.000 0.000
45 1.000 0.000 0.000
46 1.000 0.000 0.000
47 1.000 0.000 0.000
48 1.000 0.000 0.000
49 1.000 0.000 0.000
50 1.000 0.000 0.000
51 0.000 1.000 0.000
52 0.000 1.000 0.000
53 0.000 0.998 0.002
54 0.000 0.997 0.003
55 0.000 0.997 0.003
56 0.000 0.989 0.011
57 0.000 0.995 0.005
58 0.000 1.000 0.000
59 0.000 1.000 0.000
60 0.000 0.994 0.006
61 0.000 1.000 0.000
62 0.000 0.999 0.001
63 0.000 1.000 0.000
64 0.000 0.988 0.012
65 0.000 1.000 0.000
66 0.000 1.000 0.000
67 0.000 0.973 0.027
68 0.000 1.000 0.000
69 0.000 0.813 0.187
70 0.000 1.000 0.000
71 0.000 0.336 0.664
72 0.000 1.000 0.000
73 0.000 0.699 0.301
74 0.000 0.972 0.028
75 0.000 1.000 0.000
76 0.000 1.000 0.000
77 0.000 0.998 0.002
78 0.000 0.861 0.139
79 0.000 0.992 0.008
80 0.000 1.000 0.000
81 0.000 1.000 0.000
82 0.000 1.000 0.000
83 0.000 1.000 0.000
84 0.000 0.154 0.846
85 0.000 0.943 0.057
86 0.000 0.996 0.004
87 0.000 0.999 0.001
88 0.000 0.999 0.001
89 0.000 1.000 0.000
90 0.000 0.999 0.001
91 0.000 0.981 0.019
92 0.000 0.997 0.003
93 0.000 1.000 0.000
94 0.000 1.000 0.000
95 0.000 0.999 0.001
96 0.000 1.000 0.000
97 0.000 1.000 0.000
98 0.000 1.000 0.000
99 0.000 1.000 0.000
100 0.000 1.000 0.000
101 0.000 0.000 1.000
102 0.000 0.000 1.000
103 0.000 0.000 1.000
104 0.000 0.006 0.994
105 0.000 0.000 1.000
106 0.000 0.000 1.000
107 0.000 0.004 0.996
108 0.000 0.000 1.000
109 0.000 0.000 1.000
110 0.000 0.000 1.000
111 0.000 0.006 0.994
112 0.000 0.001 0.999
113 0.000 0.000 1.000
114 0.000 0.000 1.000
115 0.000 0.000 1.000
116 0.000 0.000 1.000
117 0.000 0.033 0.967
118 0.000 0.000 1.000
119 0.000 0.000 1.000
120 0.000 0.041 0.959
121 0.000 0.000 1.000
122 0.000 0.000 1.000
123 0.000 0.000 1.000
124 0.000 0.028 0.972
125 0.000 0.001 0.999
126 0.000 0.007 0.993
127 0.000 0.057 0.943
128 0.000 0.151 0.849
129 0.000 0.000 1.000
130 0.000 0.020 0.980
131 0.000 0.000
nrmiss = 0
1.000
132 0.000 0.009 0.991
133 0.000 0.000 1.000
134 0.000 0.605 0.395
135 0.000 0.000 1.000
136 0.000 0.000 1.000
137 0.000 0.000 1.000
138 0.000 0.050 0.950
139 0.000 0.141 0.859
140 0.000 0.000 1.000
141 0.000 0.000 1.000
142 0.000 0.000 1.000
143 0.000 0.000 1.000
144 0.000 0.000 1.000
145 0.000 0.000 1.000
146 0.000 0.000 1.000
147 0.000 0.000 1.000
148 0.000 0.001 0.999
149 0.000 0.000 1.000
150 0.000 0.061 0.939
D2
1 2 3
1 0.0 323.1 706.1
2 103.2 0.0 17.9
3 168.8 13.8 0.0
Covariance
1 2 3 4
1 0.1242 0.0992 0.0164 0.0103
2 0.0992 0.1437 0.0117 0.0093
3 0.0164 0.0117 0.0302 0.0061
4 0.0103 0.0093 0.0061 0.0111
Warning Errors¶
IMSLS_BAD_OBS_1 |
In call #, row # of the data matrix,
“x ”, has group number = #. The
group number must be an integer between
1.0 and “nGroups ” = #,
inclusively. This observation will be
ignored. |
IMSLS_BAD_OBS_2 |
The leaving out one method is specified but this observation does not have a valid group number (Its group number is #.). This observation (row #) is ignored. |
IMSLS_BAD_OBS_3 |
The leaving out one method is specified but this observation does not have a valid weight or it does not have a valid frequency. This observation (row #) is ignored. |
IMSLS_COV_SINGULAR_3 |
The group # covariance matrix is singular.
“stats[1] ” cannot be computed.
“stats[1] ” and “stats[3] ”
are set to the missing value code (NaN). |
Fatal Errors¶
IMSLS_BAD_IDO_1 |
“ido ” = #. Initial allocations
must be performed by making a call to
discriminantAnalysis with “ido ” =
1. |
IMSLS_BAD_IDO_2 |
“ido ” = #. A new analysis may not
begin until the previous analysis is
terminated with “ido ” equal to 5
or 6. |
IMSLS_COV_SINGULAR_1 |
The variance-covariance matrix for population number # is singular. The computations cannot continue. |
IMSLS_COV_SINGULAR_2 |
The pooled variance-covariance matrix is singular. The computations cannot continue. |
IMSLS_COV_SINGULAR_4 |
A variance-covariance matrix is singular. The index of the first zero element is equal to #. |