ClusterKMeans Class

Perform a K-means (centroid) cluster analysis.

For a list of all members of this type, see ClusterKMeans Members.

System.Object
Imsl.Stat.ClusterKMeans

public class ClusterKMeans

Thread Safety

Public static (Shared in Visual Basic) members of this type are safe for multithreaded operations. Instance members are not guaranteed to be thread-safe.

Remarks

ClusterKMeans is an implementation of Algorithm AS 136 by Hartigan and Wong (1979). It computes K-means (centroid) Euclidean metric clusters for an input matrix starting with initial estimates of the K cluster means. It allows for missing values (coded as NaN, not a number) and for weights and frequencies.

Let p denote the number of variables to be used in computing the Euclidean distance between observations. The idea in K-means cluster analysis is to find a clustering (or grouping) of the observations so as to minimize the total within-cluster sums of squares. In this case, the total sums of squares within each cluster is computed as the sum of the centered sum of squares over all nonmissing values of each variable. That is,

$\phi = \sum_{i=1}^K \sum_{j=1}^p \sum_{m=1}^{n_i} f_{\nu_{im}} w_{\nu_{im}} \delta_{\nu_{im},j} \left( x_{\nu_{im},j} - \bar x_{ij} \right)^2$

where $\nu_{im}$ denotes the row index of the m-th observation in the i-th cluster in the matrix X; is the number of rows of X assigned to group i; f denotes the frequency of the observation; w denotes its weight; d is zero if the j-th variable on observation $\nu_{im}$ is missing, otherwise $\delta$ is one; and $\bar x_{ij}$ is the average of the nonmissing observations for variable j in group i. This method sequentially processes each observation and reassigns it to another cluster if doing so results in a decrease in the total within-cluster sums of squares. See Hartigan and Wong (1979) or Hartigan (1975) for details.

Requirements

Namespace: Imsl.Stat

Assembly: ImslCS (in ImslCS.dll)

ClusterKMeans Class

Thread Safety

Remarks

Requirements

See Also