public class ClusterKNN extends Object implements Serializable, Cloneable
Perform a k-Nearest Neighbor classification.
ClusterKNN
implements an algorithm to classify objects based
on a training set. Among the simpler algorithms for classification,
classifying a new object is essentially a majority vote of its closest
k
neighbors. k
must be a positive integer and is
typically small and odd. The method is straightforward in that the distance
from the new point to every point in the training set is computed and
sorted. The k
closest points are examined and the new object is
assigned to the class that is most common in that set. For the case
k = 1
the object is assigned to the class of its nearest
neighbor.
The default distance method is the Euclidean distance, but other options
are available by using the setDistanceMethod
method. The
supported methods are:
method |
Description |
L2_NORM |
The Euclidean distance method, \( L_2\) norm, defined as the sum of the squares of the difference of each coordinate. (Default) |
L1_NORM |
The rectilinear norm or city block method, \(L_1\) norm, defined as the sum of the absolute values of the difference of each coordinate. This is most useful for integer input data. |
INFINITY_NORM |
The Chebyshev distance method, \(L_{\infty} \) norm, defined as the maximum of the absolute values of the difference of each coordinate. |
For cases where the data are poorly scaled, it may be necessary to normalize the input data first. For example, if in a 2D space the X values range from 0 to 1 and the Y values, from 0 to 1000, the distance calculations will be dominated by the Y coordinate unless they are normalized.
Modifier and Type | Field and Description |
---|---|
static int |
INFINITY_NORM
Indicates the distance is computed using the \(L_{\infty}
\) norm method.
|
static int |
L1_NORM
Indicates the distance is computed using the \(L_1\) norm method.
|
static int |
L2_NORM
Indicates the distance is computed using the \(L_2\) norm, or Euclidean distance measurement.
|
Constructor and Description |
---|
ClusterKNN(double[][] x,
int[] c)
Constructor for
ClusterKNN . |
Modifier and Type | Method and Description |
---|---|
int[] |
classify(double[][] value,
int k)
Classify a set of observations using
k nearest neighbors. |
int |
classify(double[] value,
int k)
Classify an observation using
k nearest neighbors. |
void |
setDistanceMethod(int method)
Sets the distance calculation method to be used.
|
public static final int L2_NORM
public static final int L1_NORM
public static final int INFINITY_NORM
public ClusterKNN(double[][] x, int[] c)
ClusterKNN
.x
- a double
matrix containing the known
x.length
observations of x[0].length
variablesc
- an int
array containing the categories for the
x.length
observations. All integer values
are valid.public int classify(double[] value, int k)
k
nearest neighbors.value
- a double
array of x[0].length
variables containing the observation to classifyk
- an int
containing the number of nearest
neighbors to use. An odd value is recommended.int
containing the cluster to which the
observation belongspublic int[] classify(double[][] value, int k)
k
nearest neighbors.value
- a double
matrix of value.length
observations on x[0].length
variables to
classifyk
- an int
containing the number of nearest neighbors
to use. An odd value is recommended.int
array containing the cluster to which
each of the observations belongpublic void setDistanceMethod(int method)
method
- an int
identifying the distance calculation
method to be used. By default, method
=
L2_NORM
.
method | Description |
L2_NORM |
\(\mathrm{d}(\mathbf{p},\mathbf{q})=\sqrt{ \sum_{i=1}^{n}{(q_i-p_i})^2}\) |
L1_NORM |
\(\mathrm{d}(\mathbf{p},\mathbf{q})=\sum_{i=1}^{n} \lvert{q_i-p_i}\rvert\) |
INFINITY_NORM |
\(\mathrm{d}(\mathbf{p},\mathbf{q})=\max_i(|p_i-q_i|) \) |
Copyright © 2020 Rogue Wave Software. All rights reserved.