public class ClusterKMeans extends Object implements Serializable, Cloneable
ClusterKMeans
is an implementation of Algorithm
AS 136 by Hartigan and Wong (1979). It computes Kmeans (centroid)
Euclidean metric clusters for an input matrix starting with initial estimates
of the K cluster means. It allows for missing values (coded as NaN, not a number)
and for weights and frequencies.
Let p denote the number of variables to be used in computing the Euclidean distance between observations. The idea in Kmeans cluster analysis is to find a clustering (or grouping) of the observations so as to minimize the total withincluster sums of squares. In this case, the total sums of squares within each cluster is computed as the sum of the centered sum of squares over all nonmissing values of each variable. That is,
where denotes the row index of the mth observation in the ith cluster in the matrix X; is the number of rows of X assigned to group i; f denotes the frequency of the observation; w denotes its weight; d is zero if the jth variable on observation is missing, otherwise is one; and is the average of the nonmissing observations for variable j in group i. This method sequentially processes each observation and reassigns it to another cluster if doing so results in a decrease in the total withincluster sums of squares. See Hartigan and Wong (1979) or Hartigan (1975) for details.
Modifier and Type  Class and Description 

static class 
ClusterKMeans.ClusterNoPointsException
There is a cluster with no points

static class 
ClusterKMeans.NoConvergenceException
Convergence did not occur within the maximum number of iterations.

Constructor and Description 

ClusterKMeans(double[][] x,
double[][] cs)
Constructor for
ClusterKMeans . 
Modifier and Type  Method and Description 

double[][] 
compute()
Computes the cluster means.

int[] 
getClusterCounts()
Returns the number of observations in each cluster.

int[] 
getClusterMembership()
Returns the cluster membership for each observation.

double[] 
getClusterSSQ()
Returns the within sum of squares for each cluster.

void 
setFrequencies(double[] frequencies)
Sets the frequency for each observation.

void 
setMaxIterations(int iterations)
Sets the maximum number of iterations.

void 
setWeights(double[] weights)
Sets the weight for each observation.

public ClusterKMeans(double[][] x, double[][] cs)
ClusterKMeans
.x
 A double
matrix containing the observations to
be clustered.cs
 A double
matrix containing the cluster seeds, i.e.
estimates for the cluster centers.public final double[][] compute() throws ClusterKMeans.NoConvergenceException, ClusterKMeans.ClusterNoPointsException
double
matrix containing computed result.com.imsl.stat.ClusterKMeans.NonnegativeFreqException
 is thrown if a frequency
is negative.com.imsl.stat.ClusterKMeans.NonnegativeWeightException
 is thrown if a weight
is negative.ClusterKMeans.NoConvergenceException
 is thrown if convergence did not
occur within the maximum number of iterations.ClusterKMeans.ClusterNoPointsException
 is thrown if the cluster seed
yields a cluster with no points.public int[] getClusterCounts()
compute
method must be invoked first before invoking this
method. Otherwise, the method throws a NullPointerException
exception.int
array containing the number of observations
in each cluster.public int[] getClusterMembership()
compute
method must be invoked first before invoking this
method. Otherwise, the method throws a NullPointerException
exception.int
array containing the cluster membership
for each observation. Cluster membership 1 indicates the
observation belongs to cluster 1, cluster membership 2 indicates
the observation belongs to cluster 2, etc.public double[] getClusterSSQ()
compute
method must be invoked first before invoking this
method. Otherwise, the method throws a NullPointerException
exception.double
array containing the within sum of
squares for each cluster.public void setFrequencies(double[] frequencies)
frequencies
 A double
array of size x.length
containing the frequency for each observation.
Default: frequencies[]
= 1.public void setMaxIterations(int iterations)
iterations
 An int
scalar specifying the maximum
number of iterations. Default: interations
= 30.public void setWeights(double[] weights)
weights
 A double
array of size x.length
containing the weight for each observation.
Default: weights[]
= 1.Copyright © 19702015 Rogue Wave Software
Built October 13 2015.