public class DiscriminantAnalysis extends Object
DiscriminantAnalysis
allows linear or a quadratic
discrimination and the use of several classification rules, such as
reclassification, split sample, or leave-out-one methods. One or more
observations can be added to the rule during each invocation of the
update
method.
DiscriminantAnalysis
results in the measure of distance
between the groups,(see getMahalanobis
method), a table
summarizing the classification results, (see getClassTable
), a
matrix containing the posterior probabilities of group membership for each
classified observation, (see getProbability
), the
within-sample means, (see getMeans
) and covariance matrices
computed from their LU factorizations, (see getCovariance
). The
linear discriminant function coefficients are also computed,
(see getCoefficients
method).
All observations can be input during one call to the update
method; this has the advantage of simplicity. Alternatively, one or more
rows of observations can be input during separate calls to
update
. This does not require all observations be memory
resident, a significant advantage with large data sets. Note, however, to
classify the same data set requires a second pass of the data to the
classify
method. During the first pass to the
update
method the discriminant functions are computed while in
the second pass to the classify
method the observations are
classified. When known groups are available the method getClassTable
is
useful in comparing how well the algorithm classifies. Multiple calls to
the classify
method are also allowed. The class table,
getClassTable
, is an accumulation of all observations
classified. The class membership and probabilities, returned in
getClassMembership
and getProbabilities
, will
contain the membership for each observation from the most recent
invocation of the classify
method.
Pooled only and pooled with group covariance computation cannot be mixed. By default,
both pooled and group covariance matrices will be computed. An IllegalStateException
will be thrown if an attempt is made to change the covariance computation
after the first call to the update
method. See the
setCovarianceComputation
method for more details on specifying
the covariance computation.
The within-group means are updated for all valid observations in
x
. Observations with invalid group numbers are ignored, as are
observations with missing values (Double.NaN
). The LU
factorization of the covariance matrices are updated by adding (or deleting)
observations via Givens rotations. See the downdate
method to
delete observations.
During the algorithm's training process, or each invocation of the
update
method, each observation in x
is added to
the means and the factorizations of the covariance matrices. Statistics of
interest are computed: the linear discriminant functions, the prior
probabilities, the log of the determinant of each of the covariance matrices,
and a test statistic for testing that all of the within-group covariance
matrices are equal. The matrix of Mahalanobis distances, which consists of the
distances between the groups, is computed via the pooled covariance matrix
when linear discrimination is specified. The row covariance matrix is used
when the discrimination is quadratic.
Covariance matrices are defined as follows. Let \(N_i\)
denote the sum of the frequencies of the observations in group i, and
let \(M_i\) denote the number of observations in group
i. Then, if \(S_i\) denotes the within-group
i covariance matrix,
$$S_i = \frac{1}{N_i - 1} \sum_{j=1}^{M_i} w_j f_j (x_j - \overline{x})(x_j - \overline{x})^T$$
where \(w_j\) is the weight of the j-th observation
in group i, \(f_j\) is its frequency,
\(x_j\) is the j-th observation column vector (in
group i), and \(\overline{x}\) denotes the mean
vector of the observations in group i. The mean vectors are computed as
$$\overline{x} = \frac{1}{W_i} \sum_{j=1}^{M_i} w_j f_j x_j$$
where
$$W_i = \sum_{j=1}^{M_i} w_j f_j$$
Given the means and the covariance matrices, the linear discriminant
function for group i is computed as:
$$z_i = \ln(p_i)-0.5\overline{x_i}^T S_{p}^{-1} \overline{x_i} + x^T S_{p}^{-1} \overline{x_i}$$
where \(\ln(p_i)\) is the natural log of the prior
probability for the i-th group, x is the observation to be
classified, and \(S_p\) denotes the pooled covariance
matrix.
Let S denote either the pooled covariance matrix or one of the within-group covariance matrices \(S_i\). (S will be the pooled covariance matrix in linear discrimination, and \(S_i\) otherwise.) The Mahalanobis distance between group i and group j is computed as: $$D_{ij}^{2} = (\overline{x_i} - \overline{x_j})^T S^{-1} (\overline{x_i} - \overline{x_j})$$
Finally, the asymptotic chi-squared test for the equality of covariance matrices is computed as follows (Morrison 1976, page 252): $$\gamma = C^{-1} \sum_{i=1}^{k} n_i \{ ln( \left| S_p \right| ) - ln( \left| S_i \right| ) \}$$ where \(n_i\) is the number of degrees of freedom in the i-th sample covariance matrix, \(k\) is the number of groups, and $$C^{-1} = \frac{1-2p^2 + 3p - 1}{6(p + 1)(k - 1)} \left(\sum_{i=1}^{k} \frac{1}{n_i} - \frac{1}{\sum_{j}n_j} \right)$$ where \(p\) is the number of variables.
The estimated posterior probability of each observation x belonging to group i is computed using the prior probabilities and the sample mean vectors and estimated covariance matrices under a multivariate normal assumption. Under quadratic discrimination, the within-group covariance matrices are used to compute the estimated posterior probabilities. The estimated posterior probability of an observation x belonging to group i is $$\hat{q_i}(x) = \frac{e^{-\frac{1}{2}D_{i}^{2}(x)}}{\sum_{j=1}^{k} e^{-\frac{1}{2}D_{j}^{2}(x)}}$$ where $$D_{i}^{2}(x) = \left\{ \begin{array}{ll} (x - \overline{x_i})^T S_{i}^{-1} (x - \overline{x_i}) + ln \left|S_i \right| - 2 ln(p_i) & \mbox{Linear or Quadratic, pooled, group} \\ (x - \overline{x_i})^T S_{p}^{-1} (x - \overline{x_i}) - 2 ln(p_i) & \mbox{Linear, Pooled} \end{array} \right. $$
For the leave-out-one method of classification, the sample mean vector and sample covariance matrices in the formula for $$D_{i}^{2}(x)$$ are adjusted so as to remove the observation x from their computation. For linear discrimination, the linear discriminant function coefficients are actually used to compute the same posterior probabilities.
Using the posterior probabilities, each observation in x is
classified into a group; the result is tabulated in the matrix returned by
getClassTable
and saved in the vector returned by
getClassMembership
. If a group variable is provided and the
group number is out of range, the classification table is not altered at
this stage. If the reclassification method is specified, then all
observations with no missing values are classified. When the leaving-out-one
method is used, observations with invalid group numbers, weights, frequencies
or classification variables are not classified. Regardless of the frequency,
a 1 is added (or subtracted) from the classification table for each row of
x
that is classified and contains a valid group number.
When the leaving-out-one method is used, adjustment is made to the posterior
probabilities to remove the effect of the observation in the classification
rule. In this adjustment, each observation is presumed to have a weight of
\(w_j\) and a frequency of 1.0. See Lachenbruch (1975, page 36)
for the required adjustment.
Modifier and Type | Class and Description |
---|---|
static class |
DiscriminantAnalysis.CovarianceSingularException
The variance-covariance matrix is singular.
|
static class |
DiscriminantAnalysis.EmptyGroupException
There are no observations in a group.
|
static class |
DiscriminantAnalysis.SumOfWeightsNegException
The sum of the weights have become negative.
|
Modifier and Type | Field and Description |
---|---|
static int |
LEAVE_OUT_ONE
Indicates leave-out-one classification method.
|
static int |
LINEAR
Indicates a linear discrimination method.
|
static int |
POOLED
Indicates pooled covariances computation.
|
static int |
POOLED_GROUP
Indicates pooled, group covariances computation.
|
static int |
PRIOR_EQUAL
Indicates prior equal probabilities.
|
static int |
PRIOR_PROPORTIONAL
Indicates prior proportional probabilities.
|
static int |
QUADRATIC
Indicates a quadratic discrimination method.
|
static int |
RECLASSIFICATION
Indicates reclassification classification method.
|
Constructor and Description |
---|
DiscriminantAnalysis(int nVariables,
int nGroups)
Constructs a
DiscriminantAnalysis . |
Modifier and Type | Method and Description |
---|---|
void |
classify(double[][] x)
Classify a set of observations using the linear or quadratic
discriminant functions generated during the training process.
|
void |
classify(double[][] x,
int[] varIndex)
Classify a set of observations using the linear or quadratic
discriminant functions generated during the training process.
|
void |
classify(double[][] x,
int[] frequencies,
double[] weights)
Classify a set of observations and associated frequencies and weights
using the linear or quadratic discriminant functions generated
during the training process.
|
void |
classify(double[][] x,
int[] group,
int[] varIndex)
Classify a set of observations and compare against known groups using
the linear or quadratic discriminant functions generated during the
training process.
|
void |
classify(double[][] x,
int[] varIndex,
int[] frequencies,
double[] weights)
Classify a set of observations and associated frequencies and weights
using the linear or quadratic discriminant functions generated
during the training process.
|
void |
classify(double[][] x,
int[] group,
int[] varIndex,
int[] frequencies,
double[] weights)
Classify a set of observations, associated frequencies and weights, and
compare against known groups using the linear or quadratic discriminant
functions generated during the training process.
|
void |
downdate(double[][] x,
int[] group)
Removes a set of observations from the discriminant functions.
|
void |
downdate(double[][] x,
int[] group,
int[] varIndex)
Removes a set of observations from the discriminant functions.
|
void |
downdate(double[][] x,
int[] group,
int[] frequencies,
double[] weights)
Removes a set of observations and associated frequencies and weights
from the discriminant functions.
|
void |
downdate(double[][] x,
int[] group,
int[] varIndex,
int[] frequencies,
double[] weights)
Removes a set of observations and associated frequencies and weights
from the discriminant functions.
|
int[] |
getClassMembership()
Returns the group number to which the observation was classified.
|
double[][] |
getClassTable()
Returns the classification table.
|
double[][] |
getCoefficients()
Returns the linear discriminant function coefficients.
|
double[][][] |
getCovariance()
Returns the array of covariances.
|
int[] |
getGroupCounts()
Returns the group counts.
|
double[][] |
getMahalanobis()
Returns the Mahalanobis distances between the group means.
|
double[][] |
getMeans()
Returns the variable means.
|
int |
getNRowsMissing()
Deprecated.
Use
DiscriminantAnalysis.getNumberOfRowsMissing() instead. |
int |
getNumberOfRowsMissing()
Returns the number of rows of data encountered containing missing
values (
Double.NaN ). |
double[] |
getPrior()
Returns the prior probabilities.
|
double[][] |
getProbability()
Returns the posterior probabilities for each observation.
|
double[] |
getStatistics()
Returns statistics.
|
void |
setClassificationMethod(int method)
Specifies the classification method to be either reclassification or
leave-out-one.
|
void |
setCovarianceComputation(int type)
Specifies the covariance matrix computation to be either pooled or
pooled, group.
|
void |
setDiscriminationMethod(int method)
Specifies the discrimination method used to be either linear or quadratic
discrimination.
|
void |
setPrior(double[] prior)
Specifies user supplied prior probabilities.
|
void |
setPrior(int prior)
Specifies the prior probabilities to be calculated as either equal or
proportional priors.
|
void |
update(double[][] x)
Deprecated.
Use
DiscriminantAnalysis.update(double[][], int[]) instead. |
void |
update(double[][] x,
double[] frequencies,
double[] weights)
Deprecated.
|
void |
update(double[][] x,
int groupIndex)
Deprecated.
Use
DiscriminantAnalysis.update(double[][], int[]) instead. |
void |
update(double[][] x,
int[] group)
Trains a set of observations and associated frequencies and weights by
performing a linear or quadratic discriminant function analysis among
several known groups.
|
void |
update(double[][] x,
int[] varIndex,
double[] frequencies,
double[] weights)
Deprecated.
|
void |
update(double[][] x,
int[] group,
int[] varIndex)
Trains a set of observations and associated frequencies and weights by
performing a linear or quadratic discriminant function analysis among
several known groups.
|
void |
update(double[][] x,
int[] group,
int[] frequencies,
double[] weights)
Trains a set of observations and associated frequencies and weights by
performing a linear or quadratic discriminant function analysis among
several known groups.
|
void |
update(double[][] x,
int[] group,
int[] varIndex,
int[] frequencies,
double[] weights)
Trains a set of observations and associated frequencies and weights by
performing a linear or quadratic discriminant function analysis among
several known groups.
|
void |
update(double[][] x,
int groupIndex,
double[] frequencies,
double[] weights)
Deprecated.
|
void |
update(double[][] x,
int groupIndex,
int[] varIndex)
Deprecated.
|
void |
update(double[][] x,
int groupIndex,
int[] varIndex,
double[] frequencies,
double[] weights)
Deprecated.
|
public static final int LINEAR
public static final int QUADRATIC
public static final int POOLED
public static final int POOLED_GROUP
public static final int RECLASSIFICATION
public static final int LEAVE_OUT_ONE
public static final int PRIOR_PROPORTIONAL
public static final int PRIOR_EQUAL
public DiscriminantAnalysis(int nVariables, int nGroups)
DiscriminantAnalysis
.nVariables
- an int
representing the number of
variables to be used in the discriminationnGroups
- an int
representing the number of
groups in the datapublic void update(double[][] x) throws DiscriminantAnalysis.SumOfWeightsNegException
DiscriminantAnalysis.update(double[][], int[])
instead.x
- a double
matrix containing the observations with
at least nVariables
+ 1 columns. The column
containing the group numbers must be in column
nVariables
of the input matrix. Specifically,
the first nVariables
columns correspond to the
variables, and the last column
contains the group numbers. The groups must be numbered
1,2, ..., nGroups
. Any additional columns
will be ignored.DiscriminantAnalysis.SumOfWeightsNegException
- is thrown when
the sum of the weights have become negative.public void update(double[][] x, int groupIndex) throws DiscriminantAnalysis.SumOfWeightsNegException
DiscriminantAnalysis.update(double[][], int[])
instead.x
- a double
matrix containing the observations with
at least nVariables
+ 1 columns. The first
nVariables
columns, excluding
groupIndex
column, correspond to the variables,
The groupIndex
column contains the group numbers.
Any additional columns will be ignored.groupIndex
- an int
containing the column index of
x
in which the group numbers are stored. The groups
must be numbered 1,2, ..., nGroups
. Any observations
with a group number outside of this range will be skipped.DiscriminantAnalysis.SumOfWeightsNegException
- is thrown when
the sum of the weights have become negative.public void update(double[][] x, int groupIndex, int[] varIndex) throws DiscriminantAnalysis.SumOfWeightsNegException
DiscriminantAnalysis.update(double[][], int[], int[])
instead.x
- a double
matrix containing the observations with
at least nVariables
+ 1 columns.
The columns indicated in varIndex
correspond to
the variables, and groupIndex
column contains
the group numbers. Any additional columns will be ignored.groupIndex
- an int
containing the column index of
x
in which the group numbers are stored. The groups
must be numbered 1,2, ..., nGroups
.varIndex
- an int
array containing the column indices
in x
that correspond to the variables to
be used in the analysis.DiscriminantAnalysis.SumOfWeightsNegException
- is thrown when
the sum of the weights have become negative.public void update(double[][] x, double[] frequencies, double[] weights) throws DiscriminantAnalysis.SumOfWeightsNegException
DiscriminantAnalysis.update(double[][], int[], int[], double[])
instead.x
- a double
matrix containing the observations with
at least nVariables
+ 1 columns. The
first nVariables
columns correspond to the
variables, and the last column (column nVariables
)
contains the group numbers. The groups must be numbered
1,2, ..., nGroups
.frequencies
- a double
array containing the associated
frequencies for each observation.weights
- a double
array containing the associated
weights for each observationDiscriminantAnalysis.SumOfWeightsNegException
- is thrown when
the sum of the weights have become negative.DiscriminantAnalysis.EmptyGroupException
- is thrown when
there are no observations in a group.DiscriminantAnalysis.CovarianceSingularException
- is thrown when
the variance-covariance matrix is singular.public void update(double[][] x, int groupIndex, double[] frequencies, double[] weights) throws DiscriminantAnalysis.SumOfWeightsNegException
DiscriminantAnalysis.update(double[][], int[], int[], double[])
instead.x
- a double
matrix containing the observations with
at least nVariables
+ 1 columns. The
first nVariables
columns correspond to the
variables, excluding the groupIndex
column.groupIndex
- an int
containing the column index of
x
in which the group numbers are stored. The groups
must be numbered 1,2, ..., nGroups
.frequencies
- a double
array containing the
associated frequencies for each observationweights
- a double
array containing the associated
weights for each observationDiscriminantAnalysis.SumOfWeightsNegException
- is thrown when
the sum of the weights have become negative.DiscriminantAnalysis.EmptyGroupException
- is thrown when
there are no observations in a group.DiscriminantAnalysis.CovarianceSingularException
- is thrown when
the variance-covariance matrix is singular.public void update(double[][] x, int[] varIndex, double[] frequencies, double[] weights) throws DiscriminantAnalysis.SumOfWeightsNegException
DiscriminantAnalysis.update(double[][], int[], int[], int[], double[])
instead.x
- a double
matrix containing the observations with
at least nVariables
+ 1 columns. The
columns indicated in varIndex
correspond to the
variables, and the last column (column nVariables
)
contains the group numbers. The groups must be numbered
1,2, ..., nGroups
.varIndex
- an int
array containing the column indices
in x
that correspond to the variables to be
used in the analysis.frequencies
- a double
array containing the associated
frequencies for each observation.weights
- a double
array containing the associated
weights for each observationDiscriminantAnalysis.SumOfWeightsNegException
- is thrown when
the sum of the weights have become negative.DiscriminantAnalysis.EmptyGroupException
- is thrown when
there are no observations in a group.DiscriminantAnalysis.CovarianceSingularException
- is thrown when
the variance-covariance matrix is singular.public void update(double[][] x, int groupIndex, int[] varIndex, double[] frequencies, double[] weights) throws DiscriminantAnalysis.SumOfWeightsNegException
DiscriminantAnalysis.update(double [][], int[], int[], int[], double[])
instead.x
- a double
matrix containing the observations with
at least nVariables
+ 1 columns. The
columns indicated in varIndex
correspond to the
variables, and groupIndex
column contains the
group numbers.groupIndex
- an int
containing the column index of
x
in which the group numbers are stored. The groups
must be numbered 1,2, ..., nGroups
.varIndex
- an int
array containing the column indices
in x
that correspond to the variables to
be used in the analysisfrequencies
- a double
array containing the
associated frequencies for each observationweights
- a double
array containing the associated
weights for each observationDiscriminantAnalysis.SumOfWeightsNegException
- is thrown when
the sum of the weights have become negative.public void update(double[][] x, int[] group) throws DiscriminantAnalysis.SumOfWeightsNegException
x
- a double
matrix containing the observations with
at least nVariables
columns. The first
nVariables
correspond to the variables. Any
additional columns will be ignored.group
- an int
array containing the group numbers.
The groups must be numbered nGroups
for each observation.DiscriminantAnalysis.SumOfWeightsNegException
- is thrown when
the sum of the weights have become negative.public void update(double[][] x, int[] group, int[] varIndex) throws DiscriminantAnalysis.SumOfWeightsNegException
x
- a double
matrix containing the observations with
at least nVariables
columns. The columns
indicated in varIndex
correspond to the variables.
Any additional columns will be ignored.group
- an int
array containing the group numbers.
The groups must be numbered nGroups
for each observation.varIndex
- an int
array containing the column indices
in x
that correspond to the variables to
be used in the analysisDiscriminantAnalysis.SumOfWeightsNegException
- is thrown when
the sum of the weights have become negative.public void update(double[][] x, int[] group, int[] frequencies, double[] weights) throws DiscriminantAnalysis.SumOfWeightsNegException
x
- a double
matrix containing the observations with
at least nVariables
columns. The first
nVariables
correspond to the variables. Any
additional columns will be ignored.group
- an int
array containing the group numbers.
The groups must be numbered nGroups
for each observation.frequencies
- an int
array containing the associated
frequencies for each observationweights
- a double
array containing the associated
weights for each observationDiscriminantAnalysis.SumOfWeightsNegException
- is thrown when
the sum of the weights have become negative.public void update(double[][] x, int[] group, int[] varIndex, int[] frequencies, double[] weights) throws DiscriminantAnalysis.SumOfWeightsNegException
x
- a double
matrix containing the observations with
at least nVariables
columns. The
columns indicated in varIndex
correspond to the
variables.group
- an int
array containing the group numbers.
The groups must be numbered nGroups
for each observation.varIndex
- an int
array containing the column indices
in x
that correspond to the variables to
be used in the analysisfrequencies
- an int
array containing the associated
frequencies for each observationweights
- a double
array containing the associated
weights for each observationDiscriminantAnalysis.SumOfWeightsNegException
- is thrown when
the sum of the weights have become negative.public void downdate(double[][] x, int[] group) throws DiscriminantAnalysis.SumOfWeightsNegException
x
- a double
matrix containing the observations to
be removed, with at least nVariables
columns.
The first nVariables
columns correspond to
the variables. Any additional columns will be ignored.group
- an int
array containing the group numbers.
The groups must be numbered 1,2, ..., nGroups
for each observation.DiscriminantAnalysis.SumOfWeightsNegException
- is thrown when
the sum of the weights have become negative.public void downdate(double[][] x, int[] group, int[] varIndex) throws DiscriminantAnalysis.SumOfWeightsNegException
x
- a double
matrix containing the observations to
be removed, with at least nVariables
columns.
The columns indicated in varIndex
correspond to
the variables.group
- an int
array containing the group numbers.
The groups must be numbered 1,2, ..., nGroups
for each observation.varIndex
- an int
array containing the column indices
in x
that correspond to the variables to
be used in the analysisDiscriminantAnalysis.SumOfWeightsNegException
- is thrown when
the sum of the weights have become negative.public void downdate(double[][] x, int[] group, int[] frequencies, double[] weights) throws DiscriminantAnalysis.SumOfWeightsNegException
x
- a double
matrix containing the observations to
be removed, with at least nVariables
columns.
The columns indicated in varIndex
correspond to
the variables.group
- an int
array containing the group numbers.
The groups must be numbered 1,2, ..., nGroups
for each observation.frequencies
- an int
array containing the associated
frequencies for each observationweights
- a double
array containing the associated
weights for each observationDiscriminantAnalysis.SumOfWeightsNegException
- is thrown when
the sum of the weights have become negative.public void downdate(double[][] x, int[] group, int[] varIndex, int[] frequencies, double[] weights) throws DiscriminantAnalysis.SumOfWeightsNegException
x
- a double
matrix containing the observations to
be removed, with at least nVariables
columns.
The columns indicated in varIndex
correspond to
the variables.group
- an int
array containing the group numbers.
The groups must be numbered 1,2, ..., nGroups
for each observation.varIndex
- an int
array containing the column indices
in x
that correspond to the variables to
be used in the analysisfrequencies
- an int
array containing the associated
frequencies for each observationweights
- a double
array containing the associated
weights for each observationDiscriminantAnalysis.SumOfWeightsNegException
- is thrown when
the sum of the weights have become negative.public void classify(double[][] x) throws DiscriminantAnalysis.SumOfWeightsNegException, DiscriminantAnalysis.EmptyGroupException, DiscriminantAnalysis.CovarianceSingularException
x
- a double
matrix containing the observations with
at least nVariables
columns. The first
nVariables
columns correspond to the variables.
Reclassification does not require group numbers be
present. Any additional columns will be ignored.IllegalStateException
- is thrown if the leave-out-one
classification method is chosen.DiscriminantAnalysis.SumOfWeightsNegException
- is thrown when
the sum of the weights have become negative.DiscriminantAnalysis.EmptyGroupException
- is thrown when
there are no observations in a group.DiscriminantAnalysis.CovarianceSingularException
- is thrown when
the variance-covariance matrix is singular.public void classify(double[][] x, int[] varIndex) throws DiscriminantAnalysis.SumOfWeightsNegException, DiscriminantAnalysis.EmptyGroupException, DiscriminantAnalysis.CovarianceSingularException
x
- a double
matrix containing the observations with
at least nVariables
columns. The
columns indicated in varIndex
correspond to the
variables. Reclassification does not require group numbers
be present. Additional columns will be ignored.varIndex
- an int
array containing the column indices
in x
that correspond to the variables to
be used in the analysisIllegalStateException
- is thrown if the leave-out-one
classification method is chosen.DiscriminantAnalysis.SumOfWeightsNegException
- is thrown when
the sum of the weights have become negative.DiscriminantAnalysis.EmptyGroupException
- is thrown when
there are no observations in a group.DiscriminantAnalysis.CovarianceSingularException
- is thrown when
the variance-covariance matrix is singular.public void classify(double[][] x, int[] frequencies, double[] weights) throws DiscriminantAnalysis.SumOfWeightsNegException, DiscriminantAnalysis.EmptyGroupException, DiscriminantAnalysis.CovarianceSingularException
x
- a double
matrix containing the observations with
at least nVariables
columns.
The first nVariables
columns correspond to the
variables. Reclassification does not require group numbers
be present. Any additional columns will be ignored.frequencies
- an int
array containing the associated
frequencies for each observationweights
- a double
array containing the associated
weights for each observationIllegalStateException
- is thrown if the leave-out-one
classification method is chosenDiscriminantAnalysis.SumOfWeightsNegException
- is thrown when
the sum of the weights have become negativeDiscriminantAnalysis.EmptyGroupException
- is thrown when
there are no observations in a groupDiscriminantAnalysis.CovarianceSingularException
- is thrown when
the variance-covariance matrix is singularpublic void classify(double[][] x, int[] varIndex, int[] frequencies, double[] weights) throws DiscriminantAnalysis.SumOfWeightsNegException, DiscriminantAnalysis.EmptyGroupException, DiscriminantAnalysis.CovarianceSingularException
x
- a double
matrix containing the observations with
at least nVariables
columns. The
columns indicated in varIndex
correspond to the
variables. Reclassification does not require group numbers be
present. Additional columns in x
will be ignored.varIndex
- an int
array containing the column indices
in x
that correspond to the variables to
be used in the analysisfrequencies
- an int
array containing the associated
frequencies for each observationweights
- a double
array containing the associated
weights for each observationIllegalStateException
- is thrown if the leave-out-one
classification method is chosenDiscriminantAnalysis.SumOfWeightsNegException
- is thrown when
the sum of the weights have become negativeDiscriminantAnalysis.EmptyGroupException
- is thrown when
there are no observations in a groupDiscriminantAnalysis.CovarianceSingularException
- is thrown when
the variance-covariance matrix is singularpublic void classify(double[][] x, int[] group, int[] varIndex) throws DiscriminantAnalysis.SumOfWeightsNegException, DiscriminantAnalysis.EmptyGroupException, DiscriminantAnalysis.CovarianceSingularException
x
- a double
matrix containing the observations with
at least nVariables
columns.
The columns indicated in varIndex
correspond to
the variables. Any additional columns will be ignored.group
- an int
array containing the group numbers.
The groups must be numbered 1,2, ..., nGroups
for each observation.varIndex
- an int
array containing the column indices
in x
that correspond to the variables to
be used in the analysisDiscriminantAnalysis.SumOfWeightsNegException
- is thrown when
the sum of the weights have become negativeDiscriminantAnalysis.EmptyGroupException
- is thrown when
there are no observations in a groupDiscriminantAnalysis.CovarianceSingularException
- is thrown when
the variance-covariance matrix is singularpublic void classify(double[][] x, int[] group, int[] varIndex, int[] frequencies, double[] weights) throws DiscriminantAnalysis.SumOfWeightsNegException, DiscriminantAnalysis.EmptyGroupException, DiscriminantAnalysis.CovarianceSingularException
x
- a double
matrix containing the observations with
at least nVariables
columns. The
columns indicated in varIndex
correspond to the
variables. Additional columns are ignored.group
- an int
array containing the group numbers.
The groups must be numbered 1,2, ..., nGroups
for each observation.varIndex
- an int
array containing the column indices
in x
that correspond to the variables to
be used in the analysisfrequencies
- an int
array containing the associated
frequencies for each observationweights
- a double
array containing the associated
weights for each observationDiscriminantAnalysis.SumOfWeightsNegException
- is thrown when
the sum of the weights have become negativeDiscriminantAnalysis.EmptyGroupException
DiscriminantAnalysis.CovarianceSingularException
public void setDiscriminationMethod(int method)
method
- an int
scalar indicating the method of
discrimination. Use class member
LINEAR
or QUADRATIC
.
By default, the LINEAR
method is used.public void setCovarianceComputation(int type)
type
- an int
scalar indicating the type of
covariance matrices to be computed. Use class member
POOLED
or POOLED_GROUP
.
By default, POOLED_GROUP
is used.public void setClassificationMethod(int method)
method
- an int
indicating the method of
classification. Use class member
RECLASSIFICATION
or
LEAVE_OUT_ONE
. By default, the
RECLASSIFICATION
method is used.public void setPrior(int prior)
prior
- an int
specifying how to calculate prior
probabilities as either equal or proportional prior
probabilities. Use class member
PRIOR_EQUAL
to set equal prior
probabilities, calculated as 1.0/nGroups
.
Use class member PRIOR_PROPORTIONAL
to
calculate the priors to be proportional to the sample
size in each group. The sum of all prior probabilities
is equal to 1.0. If the values calculated for the
priors are less than 1.0e-20, they will be converted
to the Math.log(1.0e-20)
. Prior probabilities are used in
calculating statistics, coefficients, Mahalanobis,
and classification probabilities.
By default, PRIOR_EQUAL
is used.public void setPrior(double[] prior)
prior
- a double
vector of length
nGroups
containing the prior probabilities
for each group. The elements of prior
should
sum to 1.0. If the values of prior
are less than
1.0e-20, they will be converted to the Math.log(1.0e-20)
. By
default, the prior probablities are calculated to be
equal, see setPrior(int)
.public double[] getPrior()
double
array of length
nGroups
containing the prior probabilities
for each group.public int[] getGroupCounts()
int
array of length nGroups
containing the number of observations in each group. If an
update has not preceeded the invocation of this method, an
array of all zeros will be returned.public double[][] getMeans() throws DiscriminantAnalysis.EmptyGroupException, DiscriminantAnalysis.CovarianceSingularException
nGroups
by nVariables
double
matrix containing the variable means.
The i-th row contains the variable means for
group i.
If this method is invoked before classification, the unscaled means will be returned.
DiscriminantAnalysis.EmptyGroupException
- is thrown when
there are no observations in a group.DiscriminantAnalysis.CovarianceSingularException
- is thrown when
the variance-covariance matrix is singular.public double[][][] getCovariance() throws DiscriminantAnalysis.EmptyGroupException, DiscriminantAnalysis.CovarianceSingularException
nVariables
by nVariables
  double
array containing the
covariances. Where, g = nGroups
+1 if
pooled, group covariance computation is specified or
g=1 if pooled covariance computation is specified.
When pooled only covariance matrices are computed, the
within-group covariance matrices are not computed. The
pooled covariance matrix is always computed and is returned
as the g-th covariance matrix.
If this method is invoked before classification, the unscaled covariance matrix will be returned.
DiscriminantAnalysis.EmptyGroupException
- is thrown when
there are no observations in a group.DiscriminantAnalysis.CovarianceSingularException
- is thrown when
the variance-covariance matrix is singular.public double[][] getCoefficients() throws DiscriminantAnalysis.EmptyGroupException, DiscriminantAnalysis.CovarianceSingularException
nGroups
by nVariables
double
matrix containing the linear
discriminant function coefficients. The first column of the
matrix contains the constant term, and the remaining columns
contain the variable coefficients. The i-th
row of the returned matrix corresponds to group i. The
coefficients are always computed as linear discriminant
function coefficients even when quadratic discrimination is
specified.DiscriminantAnalysis.EmptyGroupException
- is thrown when
there are no observations in a group.DiscriminantAnalysis.CovarianceSingularException
- is thrown when
the variance-covariance matrix is singular.public double[][] getClassTable()
nGroups
by nGroups
double
matrix containing the classification
table. The accumulation of each observation that is
classified and has a group number equal to 1, 2, ...,
nGroups
is entered into the table. If a known
group is provided, the rows of the table correspond to the
known group membership. The columns refer to the group to
which the observation was classified. If a known group is
not provided, the table will only contain the accumulated
classified groups in the column coresponding to the group
to which the observation was classified.IllegalStateException
- is thrown if no data has been classified.public double[][] getMahalanobis() throws DiscriminantAnalysis.EmptyGroupException, DiscriminantAnalysis.CovarianceSingularException
nGroups
by nGroups
 
double
matrix containing the Mahalanobis
distances between the group means. For linear
discrimination, the Mahalanobis distance
$$D_{ij}^2(x)$$ between group
means i and j is computed using the within
covariance matrix for group i in place of the pooled
covariance matrix.DiscriminantAnalysis.EmptyGroupException
- is thrown when
there are no observations in a group.DiscriminantAnalysis.CovarianceSingularException
- is thrown when
the variance-covariance matrix is singular.public double[] getStatistics() throws DiscriminantAnalysis.EmptyGroupException, DiscriminantAnalysis.CovarianceSingularException
double
array containing output
statistics.
index | Description |
0 | Sum of the degrees of freedom for the within-covariance matrices. |
1 | Chi-squared statistic. |
2 | The degrees of freedom in the chi-squared statistic. |
3 | Probability of a greater chi-squared, respectively, of a test of the homogeneity of the within-covariance matrices. (Not computed when the pooled only covariance matrix is computed). |
4 thru (4+nGroups ) |
Log of the determinant of each group's covariance matrix (not computed when the pooled only covariance matrix is computed) and of the pooled covariance matrix. |
Last (nGroups + 1 ) elements |
Sum of the weights within each group. |
Last element | Sum of the weights in all groups. |
DiscriminantAnalysis.EmptyGroupException
- is thrown when
there are no observations in a group.DiscriminantAnalysis.CovarianceSingularException
- is thrown when
the variance-covariance matrix is singular.public int[] getClassMembership()
int
array containing the group to which the
observation was classified. If an observation has an invalid
group number, frequency, or weight when the leaving-out-one
method has been specified, then the observation is not
classified and the corresponding elements of the array
are set to zero. Note this will return the classmembership
of the last set of observations classified.IllegalStateException
- is thrown if no data has been classified.public double[][] getProbability()
x.length
by nGroups
 
double
matrix containing the posterior
probabilities for each observation. Note this will return
the probabilities of the last set of observations classified.IllegalStateException
- is thrown if no data has been classified.public int getNRowsMissing()
DiscriminantAnalysis.getNumberOfRowsMissing()
instead.int
representing the number of rows of
data encountered containing missing values (NaN) for the
classification, group, weight, and/or frequency variables.
If a row of data contains a missing value (NaN) for any of
these variables, that row is excluded from the
computations.public int getNumberOfRowsMissing()
Double.NaN
).int
representing the number of rows of data
encountered containing missing values (Double.NaN
) for the
classification, group, weight, and/or frequency variables.
If a row of data contains a missing value (Double.NaN
) for
any of these variables, that row is excluded from the
computations.Copyright © 2020 Rogue Wave Software. All rights reserved.