Class ClusterKNN
- All Implemented Interfaces:
Serializable,Cloneable
Perform a k-Nearest Neighbor classification.
ClusterKNN implements an algorithm to classify objects based
on a training set. Among the simpler algorithms for classification,
classifying a new object is essentially a majority vote of its closest
k neighbors. k must be a positive integer and is
typically small and odd. The method is straightforward in that the distance
from the new point to every point in the training set is computed and
sorted. The k closest points are examined and the new object is
assigned to the class that is most common in that set. For the case
k = 1 the object is assigned to the class of its nearest
neighbor.
The default distance method is the Euclidean distance, but other options
are available by using the setDistanceMethod method. The
supported methods are:
method |
Description |
L2_NORM |
The Euclidean distance method, \( L_2\) norm, defined as the sum of the squares of the difference of each coordinate. (Default) |
L1_NORM |
The rectilinear norm or city block method, \(L_1\) norm, defined as the sum of the absolute values of the difference of each coordinate. This is most useful for integer input data. |
INFINITY_NORM |
The Chebyshev distance method, \(L_{\infty} \) norm, defined as the maximum of the absolute values of the difference of each coordinate. |
For cases where the data are poorly scaled, it may be necessary to normalize the input data first. For example, if in a 2D space the X values range from 0 to 1 and the Y values, from 0 to 1000, the distance calculations will be dominated by the Y coordinate unless they are normalized.
- See Also:
-
Field Summary
FieldsModifier and TypeFieldDescriptionstatic final intIndicates the distance is computed using the \(L_{\infty} \) norm method.static final intIndicates the distance is computed using the \(L_1\) norm method.static final intIndicates the distance is computed using the \(L_2\) norm, or Euclidean distance measurement. -
Constructor Summary
Constructors -
Method Summary
Modifier and TypeMethodDescriptionint[]classify(double[][] value, int k) Classify a set of observations usingknearest neighbors.intclassify(double[] value, int k) Classify an observation usingknearest neighbors.voidsetDistanceMethod(int method) Sets the distance calculation method to be used.
-
Field Details
-
L2_NORM
public static final int L2_NORMIndicates the distance is computed using the \(L_2\) norm, or Euclidean distance measurement.- See Also:
-
L1_NORM
public static final int L1_NORMIndicates the distance is computed using the \(L_1\) norm method. Also known as rectilinear distance or city block distance, it is most useful for integer input data.- See Also:
-
INFINITY_NORM
public static final int INFINITY_NORMIndicates the distance is computed using the \(L_{\infty} \) norm method. This is also known as the maximum difference or Chebyshev distance.- See Also:
-
-
Constructor Details
-
ClusterKNN
public ClusterKNN(double[][] x, int[] c) Constructor forClusterKNN.- Parameters:
x- adoublematrix containing the knownx.lengthobservations ofx[0].lengthvariablesc- anintarray containing the categories for thex.lengthobservations. All integer values are valid.
-
-
Method Details
-
classify
public int classify(double[] value, int k) Classify an observation usingknearest neighbors.- Parameters:
value- adoublearray ofx[0].lengthvariables containing the observation to classifyk- anintcontaining the number of nearest neighbors to use. An odd value is recommended.- Returns:
- an
intcontaining the cluster to which the observation belongs
-
classify
public int[] classify(double[][] value, int k) Classify a set of observations usingknearest neighbors.- Parameters:
value- adoublematrix ofvalue.lengthobservations onx[0].lengthvariables to classifyk- anintcontaining the number of nearest neighbors to use. An odd value is recommended.- Returns:
- an
intarray containing the cluster to which each of the observations belong
-
setDistanceMethod
public void setDistanceMethod(int method) Sets the distance calculation method to be used.- Parameters:
method- anintidentifying the distance calculation method to be used. By default,method=L2_NORM.method Description L2_NORM\(\mathrm{d}(\mathbf{p},\mathbf{q})=\sqrt{ \sum_{i=1}^{n}{(q_i-p_i})^2}\) L1_NORM\(\mathrm{d}(\mathbf{p},\mathbf{q})=\sum_{i=1}^{n} \lvert{q_i-p_i}\rvert\) INFINITY_NORM\(\mathrm{d}(\mathbf{p},\mathbf{q})=\max_i(|p_i-q_i|) \)
-