Class MultidimensionalScaling
- All Implemented Interfaces:
Serializable
Overview
Class MultidimensionalScaling performs multidimensional scaling
analysis. Input to MultidimensionalScaling consists of symmetric
similarity or dissimilarity matrices measuring distances between pairs of
objects.
In multidimensional scaling, optimized (dis)similarities - generally also
called proximities - are used to configure or position the objects
within an ndim - dimensional space, where ndim is
specified by the user. Optionally, in the individual differences scaling
model, the weight assigned to each dimension for each subject may be changed.
The Input Data
The input similarity or dissimilarity data are stored in a three-dimensional
array as a sequence of symmetric matrices. Each matrix uses the same group of
objects but refers to a specific subject (or individual).
Missing values can be indicated by Double.NaN or a negative matrix entry. In
either case, missing values are estimated as the mean dissimilarity for the
subject and used as such when computing initial estimates, and they are
omitted from the criterion function when optimal estimates are computed.
Class MultidimensionalScaling assumes a metric scaling model.
When no transformation is specified (that is,
transformationFormula = 1), then each datum (after transforming
to dissimilarities) is a measure of distance plus a constant, \( \alpha_m \).
In this case, the constant (which is always called the "intercept") is
assumed to vary with subject and must first be added to the observed
dissimilarities in order to obtain a metric. When a transformation is
specified (that is, transformationFormula \( \ne \) 1), the
meaning of \( \alpha_m \) changes (with respect to metrics). Thus, when
transformationFormula = 1, the data is assumed to be interval
while when transformationFormula \( \ne \) 1, ratio data is
assumed. A scaling factor, the "slope", is also always estimated for each
subject.
The Criterion (Stress) Function
When stressFormula = 1 or 2, the criterion or stress function
in class MultidimensionalScaling is given as
$$\phi = \sum_m\nu_m \sum_{i,j}\left(f(\delta^\ast_{ijm})-
\alpha_m-\beta_mf(\delta_{ijm})\right)^2$$
where \(\delta_{ijm}\) denotes the predicted distance between objects
i and j on subject m, \(\delta^\ast_{ijm}\) denotes
the corresponding dissimilarity (the observed distance), \(\nu_m\) is the
stress weight assigned to the m-th subject, \(f\) is one of the
transformations \(f(x)=x^2, f(x) = x,\) or \(f(x)=\ln(x)\) specified by
method setTransformationFormula, \(\alpha_m\) is the intercept
added to the transformed observations within each subject, and \(\beta_m\)
is the slope for the subject.
For stressFormula = 0, the criterion function is given as
$$\phi = \sum_m n_m \ln\left(\sum_{i,j}\left(f(\delta^\ast_{ijm})-
\alpha_m-\beta_mf(\delta_{ijm})\right)^2\right)$$
where \(n_m\) is the number of non-missing observations on the m-th
subject. Assuming fixed weights, the first derivatives of the criterion for
stressFormula = 0 are identical to the first derivatives of the
criterion when stressFormula = 1 or 2, but with weights
$$\nu_m^{-1} = \sum_{i,j} \left(f(\delta^\ast_{ijm})-
\alpha_m-\beta_mf(\delta_{ijm})\right)^2/n_m $$
Method setStressFormula can, thus, be thought of as changing
the weighting to be used in the criterion function.
The transformation \(f(x)\) specified by method
setTransformationFormula is used to obtain constant
within-subject variance of the subject dissimilarities. If the variance of
the log of the observed dissimilarities (about the predicted dissimilarities)
is constant within subject, then the log transformation should be used. In
this case, the variance of a dissimilarity should be proportional to its
magnitude. Alternatively, the within-subject variance may be constant when
distances (or squared distances) are used.
The Distance Models and Subject Weights
The following distance models for \(\delta_{ijm}\) are available in class
MultidimensionalScaling:
- Euclidean model $$ \delta^2_{ijm} = \sum_{k=1}^d(\lambda_{ik} - \lambda_{jk})^2 $$
- Individual differences model $$ \delta^2_{ijm} = \sum_{k=1}^d w_{mk}(\lambda_{ik} - \lambda_{jk})^2 $$
ndim - dimensional
space is called the point configuration.
The Stress Weights
Weights that are inversely proportional to the estimated variance of the
dissimilarities (about their predicted values) within each subject may be
preferred because such weights lead to normal distribution theory maximum
likelihood estimates (when it is assumed that the dissimilarities are
independently normally distributed with constant residual variance). When
stressFormula = 0, the estimated (conditional) variance used as
the inverse of the weight \(\nu_m\) for the m-th subject is computed
as
$$\nu_m^{-1} = \sum_{i,j} \left(f(\delta^\ast_{ijm})-
\alpha_m-\beta_mf(\delta_{ijm})\right)^2/n_m, $$
where the sum is over the observations for the subject and where \(n_m\) is
the number of observed non-missing dissimilarities for the subject. These
weights are used in the first derivatives of the criterion function.
When stressFormula = 1, the within-subject average sum of
squared dissimilarities are used for the weights. They are computed as
$$\nu_m^{-1} = \sum_{i,j}f(\delta^\ast_{ijm})^2/n_m.$$
Finally, when stressFormula = 2, the within-subject variance of
the dissimilarities is used for the weights. These are computed as
$$\nu_m^{-1} = \sum_{i,j}
\left(f(\delta^\ast_{ijm})-
\overline{f(\delta^\ast_{\cdot \cdot m})}\right)^2/n_m,$$
where \(\overline{f(\delta^\ast_{\cdot \cdot m})}\) denotes the average of
the transformed dissimilarities for the m-th subject.
The Optimization Procedure
Initial estimates for the configuration matrix \(\Lambda\) are obtained
through methods of classical scaling, as discussed in Cox and Cox (2001),
chapter 2. In the case of the individual differences model, initial estimates
for the matrix \(W\) of subject weights are computed by a method described in
De Leeuw and Pruzansky (1978). After obtaining initial estimates, a modified
Gauss-Newton algorithm is used to obtain estimates for the parameters that
optimize the criterion function. The parameters are optimized sequentially
as follows:
- Optimize the configuration estimates \(\Lambda\).
- If required, estimate the optimal subject weights
\(w_{m k}, k=1,\ldots,\)
ndim, one subject at a time. - Optimize the intercept parameters \(\alpha_m\) and the slope parameters \(\beta_m\), one subject at a time.
- If convergence has not been reached, continue at step 1.
An iteration is defined to be all of the steps 1, 2 and 3. Convergence is assumed when the maximum absolute change in any parameter during an iteration is less than \(10^{-4}\) or if there is no change in the criterion function during an iteration.
A modified Gauss-Newton algorithm is used in the estimation of all parameters. This algorithm, which is discussed in detail by Merle and Späth (1974), uses iteratively reweighted least squares on a Taylor series linearization of the parameters in \(\delta_{ijm}\). During each iteration, the stress weights, which may depend upon the parameters in the model, are assumed to be fixed.
Standardization
Both available models are over-parameterized so that the resulting parameter
estimates are not uniquely defined. For example, in the Euclidean model, the
columns of the input proximity matrix can be translated or "rotated"
(multiplied by an orthonormal matrix), and the resulting stress will not be
changed. To eliminate lack of uniqueness due to translation, model estimates
for the configuration are centered in both models. No attempt at eliminating
the rotation problem is made, but note that rotation invariance is usually
not a problem in the models given.
References
- Cox, T. F., and M. A. A. Cox (2001), Multidimensional Scaling, Second Edition, Chapman & Hall/CRC, Boca Raton, Florida.
- De Leeuw, Jan and Sandra Pruzansky (1978), A new computational method to fit the weighted Euclidean distance model, Psychometrika, 43, 479 - 490.
- Merle, G., and H. Späth (1974), Computational Experiences with Discrete Lp-Approximation, Computing, 12, 315-321.
- See Also:
-
Nested Class Summary
Nested ClassesModifier and TypeClassDescriptionstatic classA Hessian matrix is ill-defined.static classThe number of positive eigenvalues of the double-centered distance matrix is too small. -
Constructor Summary
ConstructorsConstructorDescriptionMultidimensionalScaling(double[][][] dissimilarities, int ndim) Constructor for classMultidimensionalScaling.Copy constructor for classMultidimensionalScaling. -
Method Summary
Modifier and TypeMethodDescriptionvoidcompute()Performs the multidimensional scaling.double[][]Returns the configuration matrix.double[]Returns the criterion function weight for each subject.double[][][]Returns the predicted distances.double[]Returns the intercept for each subject.double[][][]Returns the observation residuals.double[]Returns the slope for each subject.double[][]Returns the subject weights.doubleReturns the value of the summed optimized stress function.double[]Returns the value of the optimized stress function for each subject.voidsetDissimilarityConversion(int conversionType) Sets the option for conversion of input similarity matrices to dissimilarity matrices.voidsetModel(int model) Sets the model option parameter.voidsetPrintLevel(int printLevel) Sets the print level.voidsetStressFormula(int stressFormula) Sets the stress formula option.voidsetTransformationFormula(int transformationFormula) Defines the transformation used when computing the criterion function.
-
Constructor Details
-
MultidimensionalScaling
public MultidimensionalScaling(double[][][] dissimilarities, int ndim) Constructor for classMultidimensionalScaling.- Parameters:
dissimilarities- adouble, 3-dimensional array containing the dissimilarity or similarity matrices. The array has formatnsubbynstimbynstim, wherensubis the number of subjects andnstimis the number of stimuli (or objects) in each (dis)similarity matrix. Each matrix is assumed to be symmetric, and only the strictly upper triangular part is used in the computations.
Missing values can be indicated either by the standard missing value indicator, Double.NaN, or by a negative entry in arraydissimilarities.ndim- anintscalar, the dimension of the point configuration- Throws:
IllegalArgumentException- if one of the input arguments is not feasible
-
MultidimensionalScaling
Copy constructor for classMultidimensionalScaling.- Parameters:
mds- theMultidimensionalScalingobject to be copied
-
-
Method Details
-
compute
public void compute() throws MultidimensionalScaling.NotEnoughPositiveEigenvaluesException, MultidimensionalScaling.IllDefinedHessianExceptionPerforms the multidimensional scaling.- Throws:
MultidimensionalScaling.NotEnoughPositiveEigenvaluesException- if the number of positive eigenvalues of the average product moment matrix is smaller than the number of columns of the configuration matrixMultidimensionalScaling.IllDefinedHessianException- if one of the Hessians occurring during the optimization of the stress function is not positive semidefinite
-
setDissimilarityConversion
public void setDissimilarityConversion(int conversionType) Sets the option for conversion of input similarity matrices to dissimilarity matrices.- Parameters:
conversionType- anintindicating which type of conversion of the input matrices has to be performed
conversionType Conversion 0 Input data contain dissimilarities and no conversion is performed. 1 Input data are converted from similarity to dissimilarity data by subtracting each similarity from the largest similarity for the subject. 2 Input data are converted to dissimilarities by reciprocating each similarity.
Default:conversionType=0
-
setModel
public void setModel(int model) Sets the model option parameter.- Parameters:
model- anintindicating which model to use. Ifmodel= 0, the Euclidean model is used, ifmodel= 1, the individual differences model is used.
Default:model=0
-
setStressFormula
public void setStressFormula(int stressFormula) Sets the stress formula option.- Parameters:
stressFormula- anintindicating which stress formula to use in the computations. The stress formulas differ in the weighting given to each subject.stressFormula Weighting 0 Inverse of within-subject variance of observed dissimilarities about the predicted distances. 1 Inverse of within-subject sum of squared dissimilarities. 2 Inverse of within-subject variance of dissimilarities about the subject mean.
Default:stressFormula=0
-
setTransformationFormula
public void setTransformationFormula(int transformationFormula) Defines the transformation used when computing the criterion function.- Parameters:
transformationFormula- anintindicating which transformation to use on the observed and predicted dissimilarities when computing the criterion functiontransformationFormula Transformation 0 Squared distances. 1 Distances (that is, no transformation is performed). 2 Log of the distances.
Default:transformationFormula=0
-
setPrintLevel
public void setPrintLevel(int printLevel) Sets the print level.- Parameters:
printLevel- anintindicating which output is to be printedprintLevel Output 0 No printing is performed. 1 Printing is performed but the output is abbreviated. 2 All printing is performed.
Default:printLevel=0
-
getDistances
public double[][][] getDistances()Returns the predicted distances.- Returns:
- a
doublearray of sizensubbynstimbynstimcontaining the distances as predicted by the estimated parameters in the model
-
getConfiguration
public double[][] getConfiguration()Returns the configuration matrix.- Returns:
- a
doublearray of sizenstimbyndimcontaining the point configuration obtained by multidimensional scaling
-
getIntercepts
public double[] getIntercepts()Returns the intercept for each subject.- Returns:
- a
doublearray containing the intercepts for the subjects
-
getSlopes
public double[] getSlopes()Returns the slope for each subject.- Returns:
- a
doublearray containing the slopes for the subjects
-
getCriterionFunctionWeights
public double[] getCriterionFunctionWeights()Returns the criterion function weight for each subject.- Returns:
- a
doublearray containing the stress function weights for the subjects
-
getWeightedOptimizedCriterionValues
public double[] getWeightedOptimizedCriterionValues()Returns the value of the optimized stress function for each subject.- Returns:
- a
doublearray containing the value of the weighted optimized criterion within each subject
-
getWeightedOptimizedCriterionFunctionValue
public double getWeightedOptimizedCriterionFunctionValue()Returns the value of the summed optimized stress function.- Returns:
- a
doublescalar containing the value of the weighted optimized criterion function, summed over all subjects
-
getResiduals
public double[][][] getResiduals()Returns the observation residuals.- Returns:
- a
doublearray of sizensubbynstimbynstimcontaining the observation residuals for each subject
-
getSubjectWeights
public double[][] getSubjectWeights()Returns the subject weights.- Returns:
- a
doublearray of sizensubbyndimcontaining the subject weights
-