Class Dissimilarities
- All Implemented Interfaces:
Serializable,Cloneable
Class Dissimilarities computes an upper triangular matrix
(excluding the diagonal) of dissimilarities (or similarities) between the
columns (or rows) of a matrix. Nine different distance measures can be
computed. For the first three measures, three different scaling options can
be employed. The distance matrix computed is generally used as input to
clustering or multidimensional scaling functions.
The following discussion assumes that the distance measure is being
computed between the columns of the matrix. If distances between the rows of
the matrix are desired, use row = true in the
setRow method.
The distance method and scaling option used by Dissimilarities can be
set via methods setDistanceMethod and setScalingOption,
respectively. For distance methods L2_NORM, L1_NORM, or
INFINITY_NORM, each row of x is first scaled
according to the value specified by the setScalingOption method.
The scaling parameters are obtained from the values in the row scaled as either
the standard deviation of the row or the row range; the standard deviation
is computed from the unbiased estimate of the variance. If no scaling is
performed, the parameters in the following discussion are all 1.0 (see
setScalingOption). Once the scaling value (if any) has
been computed, the distance between column i and column j is
computed via the difference vector \(z_k=\frac{(x_k-y_k)}{s_k},i=1,
\ldots,ndstm\), where \(x_k\) denotes the
k-th element in the i-th column, \(y_k\)
denotes the corresponding element in the j-th column, and ndstm
is the number of rows if differencing columns and the number of
columns if differencing rows. For given \(z_i\),
the distance methods that allow scaling are defined as:
distanceMethod |
Metric |
|---|---|
L2_NORM | Euclidean distance (\(L_2 \) norm) |
L1_NORM | Sum of the absolute differences (\(L_1\) norm) |
INFINITY_NORM | Maximum difference (\(L_\infty\) norm) |
distanceMethod |
Metric |
|---|---|
MAHALANOBIS | Mahalanobis distance |
ABS_COSINE | Absolute value of the cosine of the angle between the vectors |
ANGLE_IN_RADIANS | Angle in radians (0, \(\pi\)) between the lines through the origin defined by the vectors |
CORRELATION_COEFFICIENT | Correlation coefficient |
ABS_CORRELATION_COEFFICIENT | Absolute value of the correlation coefficient |
EXACT_MATCHES | Number of exact matches, where \(x_i = y_i\). |
For the Mahalanobis distance, any variable used in computing the distance
measure that is (numerically) linearly dependent upon the previous variables
in the indexArray vector from the setIndex method is
omitted from the distance measure.
- See Also:
-
Nested Class Summary
Nested ClassesModifier and TypeClassDescriptionstatic classNo variable has positive variance.static classThe computations cannot continue because a scale factor is zero.static classThe computations cannot continue because the Euclidean norm of the column is equal to zero. -
Field Summary
FieldsModifier and TypeFieldDescriptionstatic final intIndicates the absolute value of the correlation coefficient distance method.static final intIndicates the absolute value of the cosine of the angle between the vectors distance method.static final intIndicates the angle in radians (0, \(\pi\)) between the lines through the origin defined by the vectors distance method.static final intIndicates the correlation coefficient distance method.static final intIndicates the number of exact matches distance method.static final intIndicates the maximum difference (\(L_\infty\) norm) distance method.static final intIndicates the sum of the absolute differences (\(L_1\) norm) distance method.static final intIndicates the Euclidean distance method (\(L_2\) norm).static final intIndicates the Mahalanobis distance method.static final intIndicates no scaling.static final intIndicates scaling by the range.static final intIndicates scaling by the standard deviation. -
Constructor Summary
ConstructorsConstructorDescriptionDissimilarities(double[][] x) Constructor forDissimilarities.Dissimilarities(double[][] x, int distanceMethod, int distanceScale, int iRow) Deprecated.Dissimilarities(double[][] x, int distanceMethod, int distanceScale, int iRow, int[] indexArray) Deprecated.UseDissimilarities(double[][])instead. -
Method Summary
Modifier and TypeMethodDescriptionvoidcompute()Computes a matrix of dissimilarities (or similarities) between the columns (or rows) of a matrix.final double[][]Returns the distance matrix.intReturns the method used in computing the dissimilarities or similarities.int[]getIndex()Returns the indices of the rows (columns) used in computing the distance measure.booleangetRow()Returns abooleanindicating whether distances are computed between rows or columns ofx.intReturns the scaling option.voidsetDistanceMethod(int distanceMethod) Sets the method to be used in computing the dissimilarities or similarities.voidsetIndex(int[] indexArray) Sets the indices of the rows (columns).voidsetRow(boolean row) Identifies whether distances are computed between rows or columns ofx.voidsetScalingOption(int distanceScale) Sets the scaling option used if theL2_NORM,L1_NORM, orINFINITY_NORMdistance methods are specified.
-
Field Details
-
L2_NORM
public static final int L2_NORMIndicates the Euclidean distance method (\(L_2\) norm).- See Also:
-
L1_NORM
public static final int L1_NORMIndicates the sum of the absolute differences (\(L_1\) norm) distance method.- See Also:
-
INFINITY_NORM
public static final int INFINITY_NORMIndicates the maximum difference (\(L_\infty\) norm) distance method.- See Also:
-
MAHALANOBIS
public static final int MAHALANOBISIndicates the Mahalanobis distance method.- See Also:
-
ABS_COSINE
public static final int ABS_COSINEIndicates the absolute value of the cosine of the angle between the vectors distance method.- See Also:
-
ANGLE_IN_RADIANS
public static final int ANGLE_IN_RADIANSIndicates the angle in radians (0, \(\pi\)) between the lines through the origin defined by the vectors distance method.- See Also:
-
CORRELATION_COEFFICIENT
public static final int CORRELATION_COEFFICIENTIndicates the correlation coefficient distance method.- See Also:
-
ABS_CORRELATION_COEFFICIENT
public static final int ABS_CORRELATION_COEFFICIENTIndicates the absolute value of the correlation coefficient distance method.- See Also:
-
EXACT_MATCHES
public static final int EXACT_MATCHESIndicates the number of exact matches distance method.- See Also:
-
NO_SCALING
public static final int NO_SCALINGIndicates no scaling.- See Also:
-
STD_DEV
public static final int STD_DEVIndicates scaling by the standard deviation.- See Also:
-
RANGE
public static final int RANGEIndicates scaling by the range.- See Also:
-
-
Constructor Details
-
Dissimilarities
public Dissimilarities(double[][] x, int distanceMethod, int distanceScale, int iRow) throws Dissimilarities.ScaleFactorZeroException, Dissimilarities.ZeroNormException, Dissimilarities.NoPositiveVarianceException Deprecated.UseDissimilarities(double[][])instead.Constructor forDissimilarities.- Parameters:
x- Adoublematrix containing the data input matrix.distanceMethod- Anintidentifying the method to be used in computing the dissimilarities or similarities. Acceptable values ofdistanceMethodare 0, 1, 2, ..., 8. See above for a description of these methods.distanceScale- Anintcontaining the scaling option.distanceScaleMethod 0 No scaling is performed. 1 Scale each column (row if iRow=1) by the standard deviation of the column (row).2 Scale each column (row if iRow=1) by the range of the column (row).iRow- Anintidentifying whether distances are computed between rows or columns ofx. IfiRow= 1, distances are computed between the rows ofx. Otherwise, distances between the columns ofxare computed.- Throws:
Dissimilarities.ScaleFactorZeroException- thrown when computations cannot continue because a scale factor is zeroDissimilarities.NoPositiveVarianceException- thrown when no variable has positive varianceDissimilarities.ZeroNormException- is thrown when the Euclidean norm of a column is equal to zero
-
Dissimilarities
public Dissimilarities(double[][] x, int distanceMethod, int distanceScale, int iRow, int[] indexArray) throws Dissimilarities.ScaleFactorZeroException, Dissimilarities.ZeroNormException, Dissimilarities.NoPositiveVarianceException Deprecated.UseDissimilarities(double[][])instead.Constructor forDissimilarities.- Parameters:
x- Adoublematrix containing the data input matrix.distanceMethod- Anintidentifying the method to be used in computing the dissimilarities or similarities. Acceptable values ofdistanceMethodare 0, 1, 2, ..., 8. See above for a description of these methods.distanceScale- Anintcontaining the scaling option.distanceScaleMethod 0 No scaling is performed 1 Scale each column (row if iRow=1) by the standard deviation of the column (row).2 Scale each column (row if iRow=1) by the range of the column (row)iRow- Anintidentifying whether distances are computed between rows or columns ofx. IfiRow=1, distances are computed between the rows ofx. Otherwise, distances between the columns ofxare computed.indexArray- Anintarray containing the indices of the rows (columns ifiRowis 1) to be used in computing the distance measure.- Throws:
Dissimilarities.ScaleFactorZeroException- thrown when computations cannot continue because a scale factor is zeroDissimilarities.NoPositiveVarianceException- thrown when no variable has positive variance.Dissimilarities.ZeroNormException- is thrown when the Euclidean norm of a column is equal to zero
-
Dissimilarities
public Dissimilarities(double[][] x) Constructor forDissimilarities.- Parameters:
x- Adoublematrix containing the data input matrix.
-
-
Method Details
-
setDistanceMethod
public void setDistanceMethod(int distanceMethod) Sets the method to be used in computing the dissimilarities or similarities.- Parameters:
distanceMethod- Anintidentifying the method to be used in computing the dissimilarities or similarities. Acceptable values ofdistanceMethodare:
See class description for more details. By default,distanceMethodMetric L2_NORMEuclidean distance (\(L_2 \) norm) L1_NORMSum of the absolute differences (\(L_1\) norm) INFINITY_NORMMaximum difference (\(L_\infty\) norm) MAHALANOBISMahalanobis distance ABS_COSINEAbsolute value of the cosine of the angle between the vectors ANGLE_IN_RADIANSAngle in radians (0, \(\pi\)) between the lines through the origin defined by the vectors CORRELATION_COEFFICIENTCorrelation coefficient ABS_CORRELATION_COEFFICIENTAbsolute value of the correlation coefficient EXACT_MATCHESNumber of exact matches, where \(x_i = y_i\). distanceMethod=L2_NORM.
-
getDistanceMethod
public int getDistanceMethod()Returns the method used in computing the dissimilarities or similarities. -
setScalingOption
public void setScalingOption(int distanceScale) Sets the scaling option used if theL2_NORM,L1_NORM, orINFINITY_NORMdistance methods are specified. SeesetDistanceMethod.- Parameters:
distanceScale- Anintcontaining the scaling option. By default,distanceScale=NO_SCALING.distanceScaleMethod NO_SCALINGNo scaling is performed. STD_DEVIf setRow(false), scale each column by the standard deviation of the column.
IfsetRow(true), scale each row by the standard deviation of the row.RANGEIf setRow(false), scale each column by the range of the column.
IfsetRow(true), scale each row by the range of the row.
-
getScalingOption
public int getScalingOption()Returns the scaling option. -
setRow
public void setRow(boolean row) Identifies whether distances are computed between rows or columns ofx.- Parameters:
row- Abooleanidentifying whether distances are computed between rows or columns ofx. Ifrow=true, distances are computed between the rows ofx. Otherwise, distances between the columns ofxare computed. By default,row=true.
-
getRow
public boolean getRow()Returns abooleanindicating whether distances are computed between rows or columns ofx. -
setIndex
public void setIndex(int[] indexArray) Sets the indices of the rows (columns).- Parameters:
indexArray- Anintarray containing the indices of the columns (rows ifrow=false) to be used in computing the distance measure. By default, ifrow=true,indexArray=0, 1, ..., x[0].length-1. Ifrow=false,indexArray=0, 1, ..., x.length-1, seesetRow.
-
getIndex
public int[] getIndex()Returns the indices of the rows (columns) used in computing the distance measure. -
compute
public void compute() throws Dissimilarities.ScaleFactorZeroException, Dissimilarities.ZeroNormException, Dissimilarities.NoPositiveVarianceExceptionComputes a matrix of dissimilarities (or similarities) between the columns (or rows) of a matrix.- Throws:
Dissimilarities.ScaleFactorZeroException- is thrown when computations cannot continue because a scale factor is zeroDissimilarities.NoPositiveVarianceException- is thrown when no variable has positive varianceDissimilarities.ZeroNormException- is thrown when the Euclidean norm of a column is equal to zero
-
getDistanceMatrix
public final double[][] getDistanceMatrix()Returns the distance matrix.- Returns:
- A
doublematrix containing the distance matrix.
-
Dissimilarities(double[][])instead.