ClusterKMeans (JMSL Numerical Library (jmsl) 2021.0.0 API)

java.lang.Object
- com.imsl.stat.ClusterKMeans

All Implemented Interfaces:

Serializable, Cloneable
```
public class ClusterKMeans
extends Object
implements Serializable, Cloneable
```
Perform a K-means (centroid) cluster analysis.
ClusterKMeans is an implementation of Algorithm AS 136 by Hartigan and Wong (1979). It computes K-means (centroid) Euclidean metric clusters for an input matrix starting with initial estimates of the K cluster means. It allows for missing values (coded as NaN, not a number) and for weights and frequencies.

Let p denote the number of variables to be used in computing the Euclidean distance between observations. The idea in K-means cluster analysis is to find a clustering (or grouping) of the observations so as to minimize the total within-cluster sums of squares. In this case, the total sums of squares within each cluster is computed as the sum of the centered sum of squares over all nonmissing values of each variable. That is,
$$ \phi = \sum_{i=1}^K \sum_{j=1}^p \sum_{m=1}^{n_i} f_{\nu_{im}} w_{\nu_{im}} \delta_{\nu_{im},j} \left( x_{\nu_{im},j} - \bar x_{ij} \right)^2 $$
where $\nu_{im}$ denotes the row index of the m-th observation in the i-th cluster in the matrix X; $n_i$ is the number of rows of X assigned to group i; f denotes the frequency of the observation; w denotes its weight; d is zero if the j-th variable on observation $\nu_{im}$ is missing, otherwise $\delta$ is one; and $\bar x_{ij}$ is the average of the nonmissing observations for variable j in group i. This method sequentially processes each observation and reassigns it to another cluster if doing so results in a decrease in the total within-cluster sums of squares. See Hartigan and Wong (1979) or Hartigan (1975) for details.

An algorithm for choosing efficient initial seeds was proposed by Arthur and Vassilvitskii (2007), and is known as K-means++. This algorithm has advantages over choosing arbitrary or random initial seeds from the population of observations. Access the implementation of this algorithm in this class using the constructor with the signature ClusterKMeans(double[][], int) or by calling the setClusterCenters(Random) method.

See Also:

ClusterKMeans Example 1, ClusterKMeans Example 2 using K-means++, Serialized Form

Nested Class Summary

Nested Classes
Modifier and Type	Class and Description
`static class`	`ClusterKMeans.ClusterNoPointsException` There is a cluster with no points
`static class`	`ClusterKMeans.NoConvergenceException` Convergence did not occur within the maximum number of iterations.
`static class`	`ClusterKMeans.NonnegativeFreqException` Deprecated. No longer used, replaced with an `IllegalArgumentException`.
`static class`	`ClusterKMeans.NonnegativeWeightException` Deprecated. No longer used, replaced with an `IllegalArgumentException`.

Constructor Summary

Constructors
Constructor and Description
`ClusterKMeans(double[][] x, double[][] cs)` Constructor for `ClusterKMeans`.
`ClusterKMeans(double[][] x, int k)` Constructor for `ClusterKMeans` using the K-means++ algorithm to select the initial seeds.
`ClusterKMeans(double[][] x, int k, Random r)` Constructor for `ClusterKMeans` using the K-means++ algorithm to set the initial seeds.

Method Summary

All Methods Instance Methods Concrete Methods
Modifier and Type	Method and Description
`double[][]`	`compute()` Computes the cluster means.
`int[]`	`getClusterCounts()` Returns the number of observations in each cluster.
`int[]`	`getClusterMembership()` Returns the cluster membership for each observation.
`double[]`	`getClusterSSQ()` Returns the within sum of squares for each cluster.
`double[][]`	`getInitialCenters()` Returns the initial cluster centers.
`void`	`setFrequencies(double[] frequencies)` Sets the frequency for each observation.
`void`	`setMaxIterations(int iterations)` Sets the maximum number of iterations.
`void`	`setWeights(double[] weights)` Sets the weight for each observation.

Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait

- Constructor Detail
  - ClusterKMeans
```
public ClusterKMeans(double[][] x,
                     double[][] cs)
```
    Constructor for ClusterKMeans.
    
    Parameters:
    
    x - a double matrix containing the observations to be clustered
    
    cs - a double matrix containing the cluster seeds, i.e. estimates for the cluster centers
  - ClusterKMeans
```
public ClusterKMeans(double[][] x,
                     int k)
```
    Constructor for ClusterKMeans using the K-means++ algorithm to select the initial seeds.
    
    Parameters:
    
    x - a double matrix containing the observations to be clustered
    
    k - an int indicating the number of clusters
  - ClusterKMeans
```
public ClusterKMeans(double[][] x,
                     int k,
                     Random r)
```
    Constructor for ClusterKMeans using the K-means++ algorithm to set the initial seeds.
    
    Parameters:
    
    x - a double matrix containing the observations to be clustered
    
    k - an int indicating the number of clusters
    
    r - a Random object, the random number generator to be used in the KMeans++ algorithm
    Note that r can be initialized with a seed for repeatable results.
- Method Detail
  - getInitialCenters
```
public double[][] getInitialCenters()
```
    Returns the initial cluster centers.
    The cluster centers are those set by the K-Means++ center selection algorithm or those supplied to the constructor.
    
    Returns:
    
    a double matrix containing the cluster centers
  - compute
```
public final double[][] compute()
                         throws ClusterKMeans.NoConvergenceException,
                                ClusterKMeans.ClusterNoPointsException
```
    Computes the cluster means.
    
    Returns:
    
    a double matrix containing computed result
    
    Throws:
    
    ClusterKMeans.NonnegativeFreqException - is thrown if a frequency is negative
    
    ClusterKMeans.NonnegativeWeightException - is thrown if a weight is negative
    
    ClusterKMeans.NoConvergenceException - is thrown if convergence did not occur within the maximum number of iterations
    
    ClusterKMeans.ClusterNoPointsException - is thrown if the cluster centers yield a cluster with no points
  - setWeights
```
public void setWeights(double[] weights)
```
    Sets the weight for each observation.
    
    Parameters:
    
    weights - a double array of size x.length containing the weight for each observation
    Default: weights[] = 1
  - setFrequencies
```
public void setFrequencies(double[] frequencies)
```
    Sets the frequency for each observation.
    
    Parameters:
    
    frequencies - a double array of size x.length containing the frequency for each observation
    Default: frequencies[] = 1
  - setMaxIterations
```
public void setMaxIterations(int iterations)
```
    Sets the maximum number of iterations.
    
    Parameters:
    
    iterations - an int scalar specifying the maximum number of iterations
    Default: interations = 30
  - getClusterMembership
```
public int[] getClusterMembership()
```
    Returns the cluster membership for each observation.
    Note that the compute() method must be invoked first before invoking this method. Otherwise, the method throws a IllegalStateException exception.
    
    Returns:
    
    an int array containing the cluster membership for each observation
    Cluster membership 1 indicates the observation belongs to cluster 1, cluster membership 2 indicates the observation belongs to cluster 2, etc.
  - getClusterSSQ
```
public double[] getClusterSSQ()
```
    Returns the within sum of squares for each cluster.
    Note that the compute method must be invoked first before invoking this method. Otherwise, the method throws a IllegalStateException exception.
    
    Returns:
    
    a double array containing the within sum of squares for each cluster
  - getClusterCounts
```
public int[] getClusterCounts()
```
    Returns the number of observations in each cluster.
    Note that the compute method must be invoked first before invoking this method. Otherwise, the method throws a IllegalStateException exception.
    
    Returns:
    
    an int array containing the number of observations in each cluster

Class ClusterKMeans

Nested Class Summary

Constructor Summary

Method Summary

Methods inherited from class java.lang.Object

Constructor Detail

ClusterKMeans

ClusterKMeans

ClusterKMeans

Method Detail

getInitialCenters

compute

setWeights

setFrequencies

setMaxIterations

getClusterMembership

getClusterSSQ

getClusterCounts