com.imsl.stat.ClusterKNN

All Implemented Interfaces:: Serializable, Cloneable

public class ClusterKNN extends Object implements Serializable, Cloneable

Perform a k-Nearest Neighbor classification.

ClusterKNN implements an algorithm to classify objects based on a training set. Among the simpler algorithms for classification, classifying a new object is essentially a majority vote of its closest k neighbors. k must be a positive integer and is typically small and odd. The method is straightforward in that the distance from the new point to every point in the training set is computed and sorted. The k closest points are examined and the new object is assigned to the class that is most common in that set. For the case k = 1 the object is assigned to the class of its nearest neighbor.

The default distance method is the Euclidean distance, but other options are available by using the setDistanceMethod method. The supported methods are:

`method`	Description
`L2_NORM`	The Euclidean distance method, \( L_2\) norm, defined as the sum of the squares of the difference of each coordinate. (Default)
`L1_NORM`	The rectilinear norm or city block method, \(L_1\) norm, defined as the sum of the absolute values of the difference of each coordinate. This is most useful for integer input data.
`INFINITY_NORM`	The Chebyshev distance method, \(L_{\infty} \) norm, defined as the maximum of the absolute values of the difference of each coordinate.

For cases where the data are poorly scaled, it may be necessary to normalize the input data first. For example, if in a 2D space the X values range from 0 to 1 and the Y values, from 0 to 1000, the distance calculations will be dominated by the Y coordinate unless they are normalized.

See Also:

Field Summary

Fields

Modifier and Type

Field

Description

static final int

INFINITY_NORM

Indicates the distance is computed using the \(L_{\infty} \) norm method.

static final int

L1_NORM

Indicates the distance is computed using the \(L_1\) norm method.

static final int

L2_NORM

Indicates the distance is computed using the \(L_2\) norm, or Euclidean distance measurement.
Constructor Summary

Constructors

Constructor

Description

ClusterKNN(double[][] x, int[] c)

Constructor for ClusterKNN.
Method Summary

Modifier and Type

Method

Description

int[]

classify(double[][] value, int k)

Classify a set of observations using k nearest neighbors.

int

classify(double[] value, int k)

Classify an observation using k nearest neighbors.

void

setDistanceMethod(int method)

Sets the distance calculation method to be used.

Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait

Field Details
- L2_NORM
  
  public static final int L2_NORM
  
  Indicates the distance is computed using the \(L_2\) norm, or Euclidean distance measurement.
  See Also:
  
  Constant Field Values
- L1_NORM
  
  public static final int L1_NORM
  
  Indicates the distance is computed using the \(L_1\) norm method. Also known as rectilinear distance or city block distance, it is most useful for integer input data.
  See Also:
  
  Constant Field Values
- INFINITY_NORM
  
  public static final int INFINITY_NORM
  
  Indicates the distance is computed using the \(L_{\infty} \) norm method. This is also known as the maximum difference or Chebyshev distance.
  See Also:
  
  Constant Field Values
Constructor Details
- ClusterKNN
  
  public ClusterKNN(double[][] x, int[] c)
  
  Constructor for ClusterKNN.
  
  Parameters:
  
  x - a double matrix containing the known x.length observations of x[0].length variables
  
  c - an int array containing the categories for the x.length observations. All integer values are valid.

Method Details

classify

public int classify(double[] value, int k)

Classify an observation using k nearest neighbors.

Parameters:

value - a double array of x[0].length variables containing the observation to classify

k - an int containing the number of nearest neighbors to use. An odd value is recommended.

Returns:

an int containing the cluster to which the observation belongs
classify

public int[] classify(double[][] value, int k)

Classify a set of observations using k nearest neighbors.

Parameters:

value - a double matrix of value.length observations on x[0].length variables to classify

k - an int containing the number of nearest neighbors to use. An odd value is recommended.

Returns:

an int array containing the cluster to which each of the observations belong

setDistanceMethod

public void setDistanceMethod(int method)

Sets the distance calculation method to be used.

Parameters:

method - an int identifying the distance calculation method to be used. By default, method = L2_NORM.

method	Description
`L2_NORM`	\(\mathrm{d}(\mathbf{p},\mathbf{q})=\sqrt{ \sum_{i=1}^{n}{(q_i-p_i})^2}\)
`L1_NORM`	\(\mathrm{d}(\mathbf{p},\mathbf{q})=\sum_{i=1}^{n} \lvert{q_i-p_i}\rvert\)
`INFINITY_NORM`	\(\mathrm{d}(\mathbf{p},\mathbf{q})=\max_i(\|p_i-q_i\|) \)

Class ClusterKNN

Field Summary

Constructor Summary

Method Summary

Methods inherited from class java.lang.Object

Field Details

L2_NORM

L1_NORM

INFINITY_NORM

Constructor Details

ClusterKNN

Method Details

classify

classify

setDistanceMethod