Performs a hierarchical cluster analysis from a distance matrix.
For a list of all members of this type, see ClusterHierarchical Members.
System.Object
Imsl.Stat.ClusterHierarchical
Public static (Shared in Visual Basic) members of this type are safe for multithreaded operations. Instance members are not guaranteed to be thread-safe.
Class ClusterHierarchical
conducts a hierarchical cluster analysis based upon a distance matrix, or by appropriate use of the argument transform, based upon a similarity matrix. Only the upper triangular part of the dist matrix is required as input.
Hierarchical clustering in ClusterHierarchical
proceeds as follows:
The five methods differ primarily in how the distance matrix is updated after two clusters have been joined. The argument method specifies how the distance of the cluster just merged with each of the remaining clusters will be updated. Class ClusterHierarchical
allows five methods for computing the distances. To understand these measures, suppose in the following discussion that clusters A and B have just been joined to form cluster Z, and interest is in computing the distance of Z with another cluster called C.
method | Description |
---|---|
0 | Single linkage (minimum distance). The distance from Z to C is the minimum of the distances ( A to C, B to C). |
1 | Complete linkage (maximum distance). The distance from Z to C is the maximum of the distances ( A to C, B to C). |
2 | Average-distance-within-clusters method. The distance from Z to C is the average distance of all objects that would be within the cluster formed by merging clusters Z and C. This average may be computed according to formulas given by Anderberg (1973, page 139). |
3 | Average-distance-between-clusters method. The distance from Z to C is the average distance of objects within cluster Z to objects within cluster C. This average may be computed according to methods given by Anderberg (1973, page 140). |
4 | Ward's method: Clusters are formed so as to minimize the increase in the within-cluster sums of squares. The distance between two clusters is the increase in these sums of squares if the two clusters were merged. A method for computing this distance from a squared Euclidean distance matrix is given by Anderberg (1973, pages 142-145). |
In general, single linkage will yield long thin clusters while complete linkage will yield clusters that are more spherical. Average linkage and Ward's linkage tend to yield clusters that are similar to those obtained with complete linkage.
Class ClusterHierarchical
produces a unique representation of the binary cluster tree via the following three conventions; the fact that the tree is unique should aid in interpreting the clusters. First, when two clusters are joined and each cluster contains two or more data points, the cluster initially formed with the smallest level becomes the left son. Second, when a cluster containing more than one data point is joined with a cluster containing a single data point, the cluster with the single data point becomes the right son. Third, when two clusters containing only one object are joined, the cluster with the smallest cluster number becomes the right son.
ClusterHierarchical
since a dissimilarity matrix, not the original data, is used. Class Dissimilarities may be used to compute the matrix dist for either the variables or observations.Namespace: Imsl.Stat
Assembly: ImslCS (in ImslCS.dll)
ClusterHierarchical Members | Imsl.Stat Namespace | Example 1