ClusterHierarchical Class

Performs a hierarchical cluster analysis from a distance matrix.

Inheritance Hierarchy

System.Object
Imsl.Stat.ClusterHierarchical

Namespace: Imsl.Stat
Assembly: ImslCS (in ImslCS.dll) Version: 6.5.2.0

Syntax

C++

Copy

[SerializableAttribute]
public class ClusterHierarchical

<SerializableAttribute>
Public Class ClusterHierarchical

[SerializableAttribute]
public ref class ClusterHierarchical

[<SerializableAttribute>]
type ClusterHierarchical =  class end

The ClusterHierarchical type exposes the following members.

Methods

	Name	Description
	Compute	Performs a hierarchical cluster analysis.
	Equals	Determines whether the specified object is equal to the current object. (Inherited from Object.)
	Finalize	Allows an object to try to free resources and perform other cleanup operations before it is reclaimed by garbage collection. (Inherited from Object.)
	GetClusterMembership	Returns the cluster membership of each observation.
	GetHashCode	Serves as a hash function for a particular type. (Inherited from Object.)
	GetObsPerCluster	Returns the number of observations in each cluster.
	GetType	Gets the Type of the current instance. (Inherited from Object.)
	MemberwiseClone	Creates a shallow copy of the current Object. (Inherited from Object.)
	ToString	Returns a string that represents the current object. (Inherited from Object.)

Top

Properties

	Name	Description
	ClusterLeftSons	The left sons of each merged cluster.
	ClusterLevel	The level at which the clusters are joined.
	ClusterRightSons	The right sons of each merged cluster.
	Method	The clustering method.
	NumberOfProcessors	Perform the parallel calculations with the maximum possible number of processors set to NumberOfProcessors.
	TransformType	The type of transformation.

Top

Remarks

Class ClusterHierarchical conducts a hierarchical cluster analysis based upon a distance matrix, or, by appropriate use of the transformation specified in the TransformType property, based upon a similarity matrix. Only the upper triangular part of the input matrix is used.

Hierarchical clustering in ClusterHierarchical proceeds as follows:

Initially, each data point is considered to be a cluster, numbered 1 to n, where n is the number of rows in the input matrix, dist.

If the input matrix contains similarities, the matrix is transformed to a distance matrix using the transform type specified by the property TransformType. Set k = 1.
A search is made of the distance matrix to find the two closest clusters. These clusters are merged to form a new cluster, numbered n + k. The cluster numbers of the two clusters joined at this stage are saved as Right Sons and Left Sons, and the distance measure between the two clusters is stored as Cluster Level .
Based upon the method of clustering, updating of the distance measure in the row and column of dist corresponding to the new cluster is performed.
Set k = k + 1. If k is less than n, go to Step 2.

The five methods differ primarily in how the distance matrix is updated after two clusters have been joined. The Method property specifies how the distance of the cluster just merged with each of the remaining clusters will be updated. Class ClusterHierarchical allows five methods for computing the distances. To understand these measures, suppose in the following discussion that clusters A and B have just been joined to form cluster Z, and interest is in computing the distance of Z with another cluster called C.

Method	Description
Single	Single linkage (minimum distance). The distance from Z to C is the minimum of the distances ( A to C, B to C).
Complete	Complete linkage (maximum distance). The distance from Z to C is the maximum of the distances ( A to C, B to C).
AvgWithinClusters	Average-distance-within-clusters method. The distance from Z to C is the average distance of all objects that would be within the cluster formed by merging clusters Z and C. This average may be computed according to formulas given by Anderberg (1973, page 139).
AvgBetweenClusters	Average-distance-between-clusters method. The distance from Z to C is the average distance of objects within cluster Z to objects within cluster C. This average may be computed according to methods given by Anderberg (1973, page 140).
Wards	Ward's method: Clusters are formed so as to minimize the increase in the within-cluster sums of squares. The distance between two clusters is the increase in these sums of squares if the two clusters were merged. A method for computing this distance from a squared Euclidean distance matrix is given by Anderberg (1973, pages 142-145).

In general, single linkage will yield long thin clusters while complete linkage will yield clusters that are more spherical. Average linkage and Ward's linkage tend to yield clusters that are similar to those obtained with complete linkage.

Class ClusterHierarchical produces a unique representation of the binary cluster tree via the following three conventions; the fact that the tree is unique should aid in interpreting the clusters. First, when two clusters are joined and each cluster contains two or more data points, the cluster initially formed with the smallest level becomes the left son. Second, when a cluster containing more than one data point is joined with a cluster containing a single data point, the cluster with the single data point becomes the right son. Third, when two clusters containing only one object are joined, the cluster with the smallest cluster number becomes the right son.

Comments

The clusters corresponding to the original data points are numbered from 1 to n, where n is the number of rows in dist. The n - 1 clusters formed by merging clusters are numbered n + 1 to
n + (n - 1).
Raw correlations, if used as similarities, should be made positive and transformed to a distance measure. One such transformation can be performed by setting TransformType = ReciprocalAbs.
The user may cluster either variables or observations with ClusterHierarchical since a dissimilarity matrix, not the original data, is used. Class Dissimilarities may be used to compute the matrix dist for either the variables or observations.

Reference

Imsl.Stat Namespace

Other Resources

Example 1