ClusterKMeans Class

ClusterKMeans Class

Perform a K-means (centroid) cluster analysis.

Inheritance Hierarchy

SystemObject
Imsl.StatClusterKMeans

Namespace: Imsl.Stat
Assembly: ImslCS (in ImslCS.dll) Version: 6.5.2.0

Syntax

C++

Copy

[SerializableAttribute]
public class ClusterKMeans

<SerializableAttribute>
Public Class ClusterKMeans

[SerializableAttribute]
public ref class ClusterKMeans

[<SerializableAttribute>]
type ClusterKMeans =  class end

The ClusterKMeans type exposes the following members.

Constructors

	Name	Description
	ClusterKMeans	Constructor for ClusterKMeans.

Top

Methods

	Name	Description
	Compute	Computes the cluster means.
	Equals	Determines whether the specified object is equal to the current object. (Inherited from Object.)
	Finalize	Allows an object to try to free resources and perform other cleanup operations before it is reclaimed by garbage collection. (Inherited from Object.)
	GetClusterCounts	Returns the number of observations in each cluster. Note that the Compute method must be invoked first before invoking this method. Otherwise, the method throws a NullReferenceException exception.
	GetClusterMembership	Returns the cluster membership for each observation. Note that the Compute method must be invoked first before invoking this method. Otherwise, the method throws a NullReferenceException exception.
	GetClusterSSQ	Returns the within sum of squares for each cluster. Note that the Compute method must be invoked first before invoking this method. Otherwise, the method throws a NullReferenceException exception.
	GetHashCode	Serves as a hash function for a particular type. (Inherited from Object.)
	GetType	Gets the Type of the current instance. (Inherited from Object.)
	MemberwiseClone	Creates a shallow copy of the current Object. (Inherited from Object.)
	SetFrequencies	The frequency for each observation.
	SetWeights	Sets the weight for each observation.
	ToString	Returns a string that represents the current object. (Inherited from Object.)

Top

Properties

	Name	Description
	MaxIterations	The maximum number of iterations.

Top

Remarks

ClusterKMeans is an implementation of Algorithm AS 136 by Hartigan and Wong (1979). It computes K-means (centroid) Euclidean metric clusters for an input matrix starting with initial estimates of the K cluster means. It allows for missing values (coded as NaN, not a number) and for weights and frequencies.

Let p denote the number of variables to be used in computing the Euclidean distance between observations. The idea in K-means cluster analysis is to find a clustering (or grouping) of the observations so as to minimize the total within-cluster sums of squares. In this case, the total sums of squares within each cluster is computed as the sum of the centered sum of squares over all nonmissing values of each variable. That is,

$\phi = \sum_{i=1}^K \sum_{j=1}^p \sum_{m=1}^{n_i} f_{\nu_{im}} w_{\nu_{im}} \delta_{\nu_{im},j} \left( x_{\nu_{im},j} - \bar x_{ij} \right)^2$

where $\nu_{im}$ denotes the row index of the m-th observation in the i-th cluster in the matrix X; is the number of rows of X assigned to group i; f denotes the frequency of the observation; w denotes its weight; d is zero if the j-th variable on observation $\nu_{im}$ is missing, otherwise $\delta$ is one; and $\bar x_{ij}$ is the average of the nonmissing observations for variable j in group i. This method sequentially processes each observation and reassigns it to another cluster if doing so results in a decrease in the total within-cluster sums of squares. See Hartigan and Wong (1979) or Hartigan (1975) for details.

Reference

Imsl.Stat Namespace

Other Resources

ClusterKMeans Example