ClusterKNN (JMSL Numerical Library (jmsl) 2021.0.0 API)

java.lang.Object
- com.imsl.stat.ClusterKNN

All Implemented Interfaces:: Serializable, Cloneable

public class ClusterKNN
extends Object
implements Serializable, Cloneable

Perform a k-Nearest Neighbor classification.

ClusterKNN implements an algorithm to classify objects based on a training set. Among the simpler algorithms for classification, classifying a new object is essentially a majority vote of its closest k neighbors. k must be a positive integer and is typically small and odd. The method is straightforward in that the distance from the new point to every point in the training set is computed and sorted. The k closest points are examined and the new object is assigned to the class that is most common in that set. For the case k = 1 the object is assigned to the class of its nearest neighbor.

The default distance method is the Euclidean distance, but other options are available by using the setDistanceMethod method. The supported methods are:

`method`	Description
`L2_NORM`	The Euclidean distance method, $L_2$ norm, defined as the sum of the squares of the difference of each coordinate. (Default)
`L1_NORM`	The rectilinear norm or city block method, $L_1$ norm, defined as the sum of the absolute values of the difference of each coordinate. This is most useful for integer input data.
`INFINITY_NORM`	The Chebyshev distance method, $L_{\infty}$ norm, defined as the maximum of the absolute values of the difference of each coordinate.

For cases where the data are poorly scaled, it may be necessary to normalize the input data first. For example, if in a 2D space the X values range from 0 to 1 and the Y values, from 0 to 1000, the distance calculations will be dominated by the Y coordinate unless they are normalized.

See Also:: ClusterKNN Example 1, Serialized Form

Field Summary

Fields
Modifier and Type	Field and Description
`static int`	`INFINITY_NORM` Indicates the distance is computed using the $L_{\infty}$ norm method.
`static int`	`L1_NORM` Indicates the distance is computed using the $L_1$ norm method.
`static int`	`L2_NORM` Indicates the distance is computed using the $L_2$ norm, or Euclidean distance measurement.

Constructor Summary

Constructors
Constructor and Description

ClusterKNN(double[][] x, int[] c)
Constructor for ClusterKNN.

Constructors
Constructor and Description
`ClusterKNN(double[][] x, int[] c)` Constructor for `ClusterKNN`.

Method Summary

All Methods Instance Methods Concrete Methods
Modifier and Type	Method and Description
`int[]`	`classify(double[][] value, int k)` Classify a set of observations using `k` nearest neighbors.
`int`	`classify(double[] value, int k)` Classify an observation using `k` nearest neighbors.
`void`	`setDistanceMethod(int method)` Sets the distance calculation method to be used.

Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait

Field Detail
- L2_NORM
```
public static final int L2_NORM
```
  Indicates the distance is computed using the $L_2$ norm, or Euclidean distance measurement.
  
  See Also:
  
  Constant Field Values
- L1_NORM
```
public static final int L1_NORM
```
  Indicates the distance is computed using the $L_1$ norm method. Also known as rectilinear distance or city block distance, it is most useful for integer input data.
  
  See Also:
  
  Constant Field Values
- INFINITY_NORM
```
public static final int INFINITY_NORM
```
  Indicates the distance is computed using the $L_{\infty}$ norm method. This is also known as the maximum difference or Chebyshev distance.
  
  See Also:
  
  Constant Field Values

Constructor Detail
- ClusterKNN
```
public ClusterKNN(double[][] x,
                  int[] c)
```
  Constructor for ClusterKNN.
  
  Parameters:
  
  x - a double matrix containing the known x.length observations of x[0].length variables
  
  c - an int array containing the categories for the x.length observations. All integer values are valid.

Method Detail

classify
```
public int classify(double[] value,
                    int k)
```
Classify an observation using k nearest neighbors.

Parameters:

value - a double array of x[0].length variables containing the observation to classify

k - an int containing the number of nearest neighbors to use. An odd value is recommended.

Returns:

an int containing the cluster to which the observation belongs

classify
```
public int[] classify(double[][] value,
                      int k)
```
Classify a set of observations using k nearest neighbors.

Parameters:

value - a double matrix of value.length observations on x[0].length variables to classify

k - an int containing the number of nearest neighbors to use. An odd value is recommended.

Returns:

an int array containing the cluster to which each of the observations belong

setDistanceMethod

public void setDistanceMethod(int method)

Sets the distance calculation method to be used.

Parameters:

method - an int identifying the distance calculation method to be used. By default, method = L2_NORM.

method	Description
`L2_NORM`	$\mathrm{d}(\mathbf{p},\mathbf{q})=\sqrt{ \sum_{i=1}^{n}{(q_i-p_i})^2}$
`L1_NORM`	$\mathrm{d}(\mathbf{p},\mathbf{q})=\sum_{i=1}^{n} \lvert{q_i-p_i}\rvert$
`INFINITY_NORM`	$\mathrm{d}(\mathbf{p},\mathbf{q})=\max_i(\|p_i-q_i\|)$

Class ClusterKNN

Field Summary

Constructor Summary

Method Summary

Methods inherited from class java.lang.Object

Field Detail

L2_NORM

L1_NORM

INFINITY_NORM

Constructor Detail

ClusterKNN

Method Detail

classify

classify

setDistanceMethod