public class MultidimensionalScaling extends Object implements Serializable
Overview
Class MultidimensionalScaling performs multidimensional scaling
analysis. Input to MultidimensionalScaling consists of symmetric
similarity or dissimilarity matrices measuring distances between pairs of
objects.
In multidimensional scaling, optimized (dis)similarities - generally also
called proximities - are used to configure or position the objects
within an ndim - dimensional space, where ndim is
specified by the user. Optionally, in the individual differences scaling
model, the weight assigned to each dimension for each subject may be changed.
The Input Data
The input similarity or dissimilarity data are stored in a three-dimensional
array as a sequence of symmetric matrices. Each matrix uses the same group of
objects but refers to a specific subject (or individual).
Missing values can be indicated by Double.NaN or a negative matrix entry. In
either case, missing values are estimated as the mean dissimilarity for the
subject and used as such when computing initial estimates, and they are
omitted from the criterion function when optimal estimates are computed.
Class MultidimensionalScaling assumes a metric scaling model.
When no transformation is specified (that is,
transformationFormula = 1), then each datum (after transforming
to dissimilarities) is a measure of distance plus a constant, \( \alpha_m \).
In this case, the constant (which is always called the "intercept") is
assumed to vary with subject and must first be added to the observed
dissimilarities in order to obtain a metric. When a transformation is
specified (that is, transformationFormula \( \ne \) 1), the
meaning of \( \alpha_m \) changes (with respect to metrics). Thus, when
transformationFormula = 1, the data is assumed to be interval
while when transformationFormula \( \ne \) 1, ratio data is
assumed. A scaling factor, the "slope", is also always estimated for each
subject.
The Criterion (Stress) Function
When stressFormula = 1 or 2, the criterion or stress function
in class MultidimensionalScaling is given as
$$\phi = \sum_m\nu_m \sum_{i,j}\left(f(\delta^\ast_{ijm})-
\alpha_m-\beta_mf(\delta_{ijm})\right)^2$$
where \(\delta_{ijm}\) denotes the predicted distance between objects
i and j on subject m, \(\delta^\ast_{ijm}\) denotes
the corresponding dissimilarity (the observed distance), \(\nu_m\) is the
stress weight assigned to the m-th subject, \(f\) is one of the
transformations \(f(x)=x^2, f(x) = x,\) or \(f(x)=\ln(x)\) specified by
method setTransformationFormula, \(\alpha_m\) is the intercept
added to the transformed observations within each subject, and \(\beta_m\)
is the slope for the subject.
For stressFormula = 0, the criterion function is given as
$$\phi = \sum_m n_m \ln\left(\sum_{i,j}\left(f(\delta^\ast_{ijm})-
\alpha_m-\beta_mf(\delta_{ijm})\right)^2\right)$$
where \(n_m\) is the number of non-missing observations on the m-th
subject. Assuming fixed weights, the first derivatives of the criterion for
stressFormula = 0 are identical to the first derivatives of the
criterion when stressFormula = 1 or 2, but with weights
$$\nu_m^{-1} = \sum_{i,j} \left(f(\delta^\ast_{ijm})-
\alpha_m-\beta_mf(\delta_{ijm})\right)^2/n_m $$
Method setStressFormula can, thus, be thought of as changing
the weighting to be used in the criterion function.
The transformation \(f(x)\) specified by method
setTransformationFormula is used to obtain constant
within-subject variance of the subject dissimilarities. If the variance of
the log of the observed dissimilarities (about the predicted dissimilarities)
is constant within subject, then the log transformation should be used. In
this case, the variance of a dissimilarity should be proportional to its
magnitude. Alternatively, the within-subject variance may be constant when
distances (or squared distances) are used.
The Distance Models and Subject Weights
The following distance models for \(\delta_{ijm}\) are available in class
MultidimensionalScaling:
ndim - dimensional
space is called the point configuration.
The Stress Weights
Weights that are inversely proportional to the estimated variance of the
dissimilarities (about their predicted values) within each subject may be
preferred because such weights lead to normal distribution theory maximum
likelihood estimates (when it is assumed that the dissimilarities are
independently normally distributed with constant residual variance). When
stressFormula = 0, the estimated (conditional) variance used as
the inverse of the weight \(\nu_m\) for the m-th subject is computed
as
$$\nu_m^{-1} = \sum_{i,j} \left(f(\delta^\ast_{ijm})-
\alpha_m-\beta_mf(\delta_{ijm})\right)^2/n_m, $$
where the sum is over the observations for the subject and where \(n_m\) is
the number of observed non-missing dissimilarities for the subject. These
weights are used in the first derivatives of the criterion function.
When stressFormula = 1, the within-subject average sum of
squared dissimilarities are used for the weights. They are computed as
$$\nu_m^{-1} = \sum_{i,j}f(\delta^\ast_{ijm})^2/n_m.$$
Finally, when stressFormula = 2, the within-subject variance of
the dissimilarities is used for the weights. These are computed as
$$\nu_m^{-1} = \sum_{i,j}
\left(f(\delta^\ast_{ijm})-
\overline{f(\delta^\ast_{\cdot \cdot m})}\right)^2/n_m,$$
where \(\overline{f(\delta^\ast_{\cdot \cdot m})}\) denotes the average of
the transformed dissimilarities for the m-th subject.
The Optimization Procedure
Initial estimates for the configuration matrix \(\Lambda\) are obtained
through methods of classical scaling, as discussed in Cox and Cox (2001),
chapter 2. In the case of the individual differences model, initial estimates
for the matrix \(W\) of subject weights are computed by a method described in
De Leeuw and Pruzansky (1978). After obtaining initial estimates, a modified
Gauss-Newton algorithm is used to obtain estimates for the parameters that
optimize the criterion function. The parameters are optimized sequentially
as follows:
ndim, one subject at a time.
Standardization
Both available models are over-parameterized so that the resulting parameter
estimates are not uniquely defined. For example, in the Euclidean model, the
columns of the input proximity matrix can be translated or "rotated"
(multiplied by an orthonormal matrix), and the resulting stress will not be
changed. To eliminate lack of uniqueness due to translation, model estimates
for the configuration are centered in both models. No attempt at eliminating
the rotation problem is made, but note that rotation invariance is usually
not a problem in the models given.
References
| Modifier and Type | Class and Description |
|---|---|
static class |
MultidimensionalScaling.IllDefinedHessianException
A Hessian matrix is ill-defined.
|
static class |
MultidimensionalScaling.NotEnoughPositiveEigenvaluesException
The number of positive eigenvalues of the double-centered distance matrix
is too small.
|
| Constructor and Description |
|---|
MultidimensionalScaling(double[][][] dissimilarities,
int ndim)
Constructor for class
MultidimensionalScaling. |
MultidimensionalScaling(MultidimensionalScaling mds)
Copy constructor for class
MultidimensionalScaling. |
| Modifier and Type | Method and Description |
|---|---|
void |
compute()
Performs the multidimensional scaling.
|
double[][] |
getConfiguration()
Returns the configuration matrix.
|
double[] |
getCriterionFunctionWeights()
Returns the criterion function weight for each subject.
|
double[][][] |
getDistances()
Returns the predicted distances.
|
double[] |
getIntercepts()
Returns the intercept for each subject.
|
double[][][] |
getResiduals()
Returns the observation residuals.
|
double[] |
getSlopes()
Returns the slope for each subject.
|
double[][] |
getSubjectWeights()
Returns the subject weights.
|
double |
getWeightedOptimizedCriterionFunctionValue()
Returns the value of the summed optimized stress function.
|
double[] |
getWeightedOptimizedCriterionValues()
Returns the value of the optimized stress function for each subject.
|
void |
setDissimilarityConversion(int conversionType)
Sets the option for conversion of input similarity matrices to
dissimilarity matrices.
|
void |
setModel(int model)
Sets the model option parameter.
|
void |
setPrintLevel(int printLevel)
Sets the print level.
|
void |
setStressFormula(int stressFormula)
Sets the stress formula option.
|
void |
setTransformationFormula(int transformationFormula)
Defines the transformation used when computing the criterion function.
|
public MultidimensionalScaling(double[][][] dissimilarities,
int ndim)
MultidimensionalScaling.dissimilarities - a double, 3-dimensional array
containing the dissimilarity or similarity matrices. The array has format
nsub by nstim by nstim, where
nsub is the number of subjects and nstim is the
number of stimuli (or objects) in each (dis)similarity matrix. Each
matrix is assumed to be symmetric, and only the strictly upper triangular
part is used in the computations.dissimilarities.ndim - an int scalar, the dimension of the
point configurationIllegalArgumentException - if one of the input arguments is not
feasiblepublic MultidimensionalScaling(MultidimensionalScaling mds)
MultidimensionalScaling.mds - the MultidimensionalScaling object to be copiedpublic void compute()
throws MultidimensionalScaling.NotEnoughPositiveEigenvaluesException,
MultidimensionalScaling.IllDefinedHessianException
MultidimensionalScaling.NotEnoughPositiveEigenvaluesException - if the number of positive
eigenvalues of the average product moment matrix is smaller than
the number of columns of the configuration matrixMultidimensionalScaling.IllDefinedHessianException - if one of the Hessians occurring
during the optimization of the stress function is not positive
semidefinitepublic void setDissimilarityConversion(int conversionType)
conversionType - an int indicating which type of
conversion of the input matrices has to be performed| conversionType | Conversion |
| 0 | Input data contain dissimilarities and no conversion is performed. |
| 1 | Input data are converted from similarity to dissimilarity data by subtracting each similarity from the largest similarity for the subject. |
| 2 | Input data are converted to dissimilarities by reciprocating each similarity. |
conversionType = 0public void setModel(int model)
model - an int indicating which model to use. If
model = 0, the Euclidean model is used, if
model = 1, the individual differences model is used.model = 0public void setStressFormula(int stressFormula)
stressFormula - an int indicating which stress formula
to use in the computations. The stress formulas differ in the weighting
given to each subject.
| stressFormula | Weighting |
| 0 | Inverse of within-subject variance of observed dissimilarities about the predicted distances. |
| 1 | Inverse of within-subject sum of squared dissimilarities. |
| 2 | Inverse of within-subject variance of dissimilarities about the subject mean. |
stressFormula = 0public void setTransformationFormula(int transformationFormula)
transformationFormula - an int indicating which
transformation to use on the observed and predicted dissimilarities when
computing the criterion function
| transformationFormula | Transformation |
| 0 | Squared distances. |
| 1 | Distances (that is, no transformation is performed). |
| 2 | Log of the distances. |
transformationFormula = 0public void setPrintLevel(int printLevel)
printLevel - an int indicating which output is to be
printed
| printLevel | Output |
| 0 | No printing is performed. |
| 1 | Printing is performed but the output is abbreviated. |
| 2 | All printing is performed. |
printLevel = 0public double[][][] getDistances()
double array of size nsub by
nstim by nstim containing the distances as
predicted by the estimated parameters in the modelpublic double[][] getConfiguration()
double array of size nstim by
ndim containing the point configuration obtained by
multidimensional scalingpublic double[] getIntercepts()
double array containing the intercepts for the
subjectspublic double[] getSlopes()
double array containing the slopes for the
subjectspublic double[] getCriterionFunctionWeights()
double array containing the stress function
weights for the subjectspublic double[] getWeightedOptimizedCriterionValues()
double array containing the value of the weighted
optimized criterion within each subjectpublic double getWeightedOptimizedCriterionFunctionValue()
double scalar containing the value of the weighted
optimized criterion function, summed over all subjectspublic double[][][] getResiduals()
double array of size nsub by
nstim by nstim containing the observation
residuals for each subjectpublic double[][] getSubjectWeights()
double array of size nsub by
ndim containing the subject weightsCopyright © 2022 Rogue Wave Software. All rights reserved.