public class ClusterKNN extends Object implements Serializable, Cloneable
Perform a k-Nearest Neighbor classification.
ClusterKNN implements an algorithm to classify objects based
on a training set. Among the simpler algorithms for classification,
classifying a new object is essentially a majority vote of its closest
k neighbors. k must be a positive integer and is
typically small and odd. The method is straightforward in that the distance
from the new point to every point in the training set is computed and
sorted. The k closest points are examined and the new object is
assigned to the class that is most common in that set. For the case
k = 1 the object is assigned to the class of its nearest
neighbor.
The default distance method is the Euclidean distance, but other options
are available by using the setDistanceMethod method. The
supported methods are:
method |
Description |
L2_NORM |
The Euclidean distance method, |
L1_NORM |
The rectilinear norm or city block method,
|
INFINITY_NORM |
The Chebyshev distance method, |
For cases where the data are poorly scaled, it may be necessary to normalize the input data first. For example, if in a 2D space the X values range from 0 to 1 and the Y values, from 0 to 1000, the distance calculations will be dominated by the Y coordinate unless they are normalized.
| Modifier and Type | Field and Description |
|---|---|
static int |
INFINITY_NORM
Indicates the distance is computed using the
|
static int |
L1_NORM
Indicates the distance is computed using the
|
static int |
L2_NORM
Indicates the distance is computed using the
|
| Constructor and Description |
|---|
ClusterKNN(double[][] x,
int[] c)
Constructor for
ClusterKNN. |
| Modifier and Type | Method and Description |
|---|---|
int[] |
classify(double[][] value,
int k)
Classify a set of observations using
k nearest neighbors. |
int |
classify(double[] value,
int k)
Classify an observation using
k nearest neighbors. |
void |
setDistanceMethod(int method)
Sets the distance calculation method to be used.
|
public static final int INFINITY_NORM
public static final int L1_NORM
public static final int L2_NORM
public ClusterKNN(double[][] x,
int[] c)
ClusterKNN.x - A double matrix containing the known
x.length observations of x[0].length
variables.c - An int array containing the categories for the
x.length observations. All integer values
are valid.public int[] classify(double[][] value,
int k)
k nearest neighbors.value - A double matrix of value.length
observations and x[0].length variables to
classify.k - An int containing the number of nearest neigbors
to use. An odd value is recommended.int array containing the cluster to which
each of the observations belong.public int classify(double[] value,
int k)
k nearest neighbors.value - A double array of x[0].length
variables containing the observations to classify.k - An int containing the number of nearest neigbors
to use. An odd value is recommended.int containing the cluster to which the
observation belongs.public void setDistanceMethod(int method)
method - An int identifying the distance calculation
method to be used. By default, method =
L2_NORM.
| method | Description |
L2_NORM |
|
L1_NORM |
|
INFINITY_NORM |
Copyright © 1970-2015 Rogue Wave Software
Built March 24 2015.