FACTR
Extracts initial factor loading estimates in factor analysis.
Required Arguments
COV — NVAR by NVAR matrix containing the variance‑covariance or correlation matrix. (Input)
NF — Number of factors in the model. (Input)
UNIQ — Vector of length NVAR containing the unique variances. (Input/Output, if
INIT = 1; output, otherwise)
If INIT = 1, UNIQ contains the initial estimates of these variances on input. On output, UNIQ contains the estimated unique variances. For IMTH = 0, the unique variances are assumed to be known and are not changed from the input values when INIT = 1.
A — NVAR by NF matrix of unrotated factor loadings. (Output)
Optional Arguments
NVAR — Number of variables. (Input)
Default: NVAR = size (COV,2).
LDCOV — Leading dimension of COV exactly as specified in the dimension statement in the calling program. (Input)
Default: LDCOV = size (COV,1).
IMTH — Method used to obtain the estimates. (Input)
Default: IMTH = 0.
IMTH |
Method |
0 |
Principal component (principal component model) or principal factor (common factor model). If INIT = 1 and UNIQ contains zeros, then this option results in the principal component method. Otherwise, the principal factor method is used. |
1 |
Unweighted least squares (common factor model). |
2 |
Generalized least squares (common factor model). |
3 |
Maximum likelihood (common factor model). |
4 |
Image factor analysis (common factor model). |
5 |
Alpha factor analysis (common factor model). |
NDF — Number of degrees of freedom in COV. (Input)
NDF is not required when IMTH = 0, 1, or 4. NDF defaults to 100 if NDF = 0.
Default: NDF = 0.
INIT — Method used to obtain initial estimates of the unique variances. (Input)
Default: INIT = 0.
INIT |
Method |
0 |
Initial estimates are taken as the constant 1 ‑ NF/(2 * NVAR) divided by the diagonal elements of the inverse of COV. |
1 |
Initial estimates are input in vector UNIQ. |
MAXIT — Maximum number of iterations in the iterative procedure. (Input)
Typical for methods 1 to 3 is 30, while 60 is typical for method 5. MAXIT is not referenced when IMTH = 0 or 4.
Default: MAXIT = 30.
MAXSTP — Maximum number of step halvings allowed during any one iteration. (Input)
Typical is 8. MAXSTP is not referenced when IMTH = 0, 4, or 5.
Default: MAXSTP = 8.
EPS — Convergence criterion used to terminate the iterations. (Input)
For methods 1 to 3, convergence is assumed when the relative change in the criterion is less than EPS. For method 5, convergence is assumed when the maximum change (relative to the variance) of a uniqueness is less than EPS. EPS is not referenced when IMTH = 0 or 4. EPS = 0.0001 is typical.
Default: EPS = 0.0001.
EPSE — Convergence criterion used to switch to exact second derivatives. (Input)
When the largest relative change in the unique standard deviation vector is less than EPSE, exact second derivative vectors are used. Typical is 0.1. EPSE is not referenced when IMTH = 0, 4, or 5.
Default: EPSE = 0.1.
IPRINT — Printing option. (Input)
If IPRINT = 0, then no printing is performed. If IPRINT = 1, then printing of the final results is performed. If IPRINT = 2, then printing of an iteration summary and the final results is performed.
Default: IPRINT = 0.
LDA — Leading dimension of A exactly as specified in the dimension statement of the calling program. (Input)
Default: LDA = size (A,1).
EVAL — Vector of length NVAR containing the eigenvalues of the matrix from which the factors were extracted. (Output)
If IMTH = 5, then the first NF positions of EVAL contain the ALPHA coefficients. Note that EVAL does not usually contain eigenvalues for matrix COV.
STAT — Vector of length 6 containing some output statistics. (Output)
ISTAT(I)
1 Value of the function minimum.
2 Tucker reliability coefficient.
3 Chi‑squared test statistic for testing that NF common factors are adequate for the data.
4 Degrees of freedom in chi‑squared. This is computed as ((NVAR ‑ NF)2 ‑ NVAR ‑ NF)/2.
5 Probability of a greater chi‑squared statistic.
6 Number of iterations.
STAT is not used when IMTH = 0, 4, or 5.
DER — Vector of length NVAR containing the parameter updates when convergence was reached (or the iterations terminated). (Output)
FORTRAN 90 Interface
Generic: CALL FACTR (COV, NF, UNIQ, A [, …])
Specific: The specific interface names are S_FACTR and D_FACTR.
FORTRAN 77 Interface
Single: CALL FACTR (NVAR, COV, LDCOV, NF, IMTH, NDF, INIT, MAXIT, MAXSTP, EPS, EPSE, IPRINT, UNIQ, A, LDA, EVAL, STAT, DER)
Double: The double precision name is DFACTR.
Description
Routine FACTR computes unrotated factor loadings in exploratory factor analysis models. Models available in FACTR are the principal component model for factor analysis and the common factor model with additions to the common factor model in alpha factor analysis and image analysis. Methods of estimation include principal components, principal factor, image analysis, unweighted least squares, generalized least squares, and maximum likelihood.
In the factor analysis model used for factor extraction, the basic model is given as Σ = ΛΛT + Ψ where Σ is the p × p population covariance matrix, Λ is the p × k matrix of factor loadings relating the factors ƒ to the observed variables x, and Ψ is the p × p matrix of covariances of the unique errors e. Here, p = NVAR and k = NF. The relationship between the factors, the unique errors, and the observed variables is given as x = Λƒ + e where, in addition, it is assumed that the expected values of e, f, and x are zero. (The sample means can be subtracted from x if the expected value of x is not zero.) It is also assumed that each factor has unit variance, the factors are independent of each other, and that the factors and the unique errors are mutually independent. In the common factor model, the elements of the vector of unique errors e are also assumed to be independent of one another so that the matrix Ψ is diagonal. This is not the case in the principal component model in which the errors may be correlated.
Further differences between the various methods concern the criterion that is optimized and the amount of computer effort required to obtain estimates. Generally speaking, the least‑squares and maximum likelihood methods, which use iterative algorithms, require the most computer time with the principal factor, principal component and the image methods requiring much less time since the algorithms in these methods are not iterative. The algorithm in alpha factor analysis is also iterative, but the estimates in this method generally require somewhat less computer effort than the least‑squares and maximum likelihood estimates. In all algorithms, one eigensystem analysis is required on each iteration.
The Principal Component and Principal Factor Methods
Both the principal component and the principal factor methods compute the factor loading estimates as
where Γ and the diagonal matrix Δ are the eigenvectors and eigenvalues of a matrix. In the principal component model, the eigensystem analysis is performed on the sample covariance (correlation) matrix S while in the principal factor model the matrix (S ‑ Ψ) is used. If the unique error variances Ψ are not known (i.e., if INIT = 0) in the principal factor model, then FACTR obtains estimates for them as discussed in Comment 3. If the principal components model is to be used, then the INIT = 1 option should be set, and the vector UNIQ should be set so that all elements are zero. If UNIQ is not set, principal factor model estimates are computed.
The basic idea in the principal component method is to find factors that maximize the variance in the original data that is explained by the factors. Because this method allows the unique errors to be correlated, some factor analysts insist that the principal component method is not a factor analytic method. Usually however, the estimates obtained via the principal component model and other models in factor analysis will be quite similar.
It should be noted that both the principal component and the principal factor methods give different results when the correlation matrix is used in place of the covariance matrix. Indeed, any rescaling of the sample covariance matrix can lead to different estimates with either of these methods. A further difficulty with the principal factor method is the problem of estimating the unique error variances. Theoretically, these must be known in advance and passed to FACTR through UNIQ. In practice, the estimates of these parameters produced via the INIT = 0 option in FACTR are often used. In either case, the resulting adjusted covariance (correlation) matrix
may not yield the NF positive eigenvalues required for NF factors to be obtained. If this occurs, the user must either lower the number of factors to be estimated or give new unique error variance values.
The Least-Squares and Maximum Likelihood Methods
Unlike the previous two methods, the algorithm used to compute estimates in this section is iterative (see Joreskog 1977). As with the principal factor model, the user may either initialize UNIQ or allow FACTR to compute initial estimates for the unique error variances. Unlike the principal factor method, FACTR then optimizes the criterion function with respect to both Ψ and Γ. (In the principal factor method, Ψ is assumed to be known. Given Ψ, estimates for Λ may be obtained.)
The major differences between the methods discussed in this section are in the criterion function that is optimized. Let S denote the sample covariance (correlation) matrix, and let Σ denote the covariance matrix that is to be estimated by the factor model. In the unweighted least‑squares method, also called the iterated principal factor method or the minres method (see Harman 1976, page 177), the function minimized is the sum of the squared differences between S and Σ. This is written as ɸul = .5 trace((S ‑ Σ)2).
Generalized least‑squares and maximum likelihood estimates are asymptotically equivalent methods. Maximum likelihood estimates maximize the (normal theory) likelihood {ɸml = trace(Σ−1S) ‑ log (∣Σ−1S∣)} while generalized least squares optimizes the function ɸgs = trace ((ΣS−1 ‑ I)2).
In all three methods, a two‑stage optimization procedure is used. This proceeds by first solving the likelihood equations for Λ in terms of Ψ and substituting the solution into the likelihood. This gives a criterion ɸ(Ψ,Λ(Ψ)), which is optimized with respect to Ψ. In the second stage, the estimates
are obtained from the estimates for Ψ.
The generalized least‑squares and the maximum likelihood methods allow for the computation of a statistic (STAT(3)) for testing that NF common factors are adequate to fit the model. This is a chi‑squared test that all remaining parameters associated with additional factors are zero. If the probability of a larger chi‑squared is small (see STAT(5)) so that the null hypothesis is rejected, then additional factors are needed (although these factors may not be of any practical importance). Failure to reject does not legitimize the model. The statistic STAT(3) is a likelihood ratio statistic in maximum likelihood estimation. As such, it asymptotically follows a chi‑squared distribution with degrees of freedom given in STAT(4).
The Tucker and Lewis (1973) reliability coefficient, ρ, is returned in STAT(2) when the maximum likelihood or generalized least‑squares methods are used. This coefficient is an estimate of the ratio of explained to the total variation in the data. It is computed as follows:
where ∣S∣ is determinant of COV, p = NVAR, k = NF, ɸ is the optimized criterion, and d = NDF.
Image Analysis
The term “image analysis” is used here to denote the noniterative image method of Kaiser (1963). It is not the image factor analysis discussed by Harman (1976, page 226). The image method (as well as the alpha factor analysis method) begins with the notion that only a finite number from an infinite number of possible variables have been measured. The image factor pattern is calculated under the assumption that the ratio of the number of factors to the number of observed variables is near zero so that a very good estimate for the unique error variances (for standardized variables) is given as one minus the squared multiple correlation of the variable under consideration with all variables in the covariance matrix.
First, the matrix D2 = (diag(S−1))−1 is computed where the operator “diag” results in a matrix consisting of the diagonal elements of its argument, and S is the sample covariance (correlation) matrix. Then, the eigenvalues Λ and eigenvectors Γ of the matrix D−1S D−1 are computed. Finally, the unrotated image factor pattern matrix is computed as A = DΓ[(Λ ‑ I)2Λ−1]1∕2.
Alpha Factor Analysis
The alpha factor analysis method of Kaiser and Caffrey (1965) finds factor‑loading estimates to maximize the correlation between the factors and the complete universe of variables of interest. The basic idea in this method is as follows: only a finite number of variables out of a much larger set of possible variables is observed. The population factors are linearly related to this larger set while the observed factors are linearly related to the observed variables. Let ƒ denote the factors obtainable from a finite set of observed random variables, and let ξ denote the factors obtainable from the universe of observable variables. Then, the alpha method attempts to find factor‑loading estimates so as to maximize the correlation between ƒ and ξ. In order to obtain these estimates, the iterative algorithm of Kaiser and Caffrey (1965) is used.
Comments
1. FACTR makes no attempt to solve for NF, the number of factors. In general, if NF is not known in advance, several different values of NF should be used, and the most reasonable value kept in the final solution.
2. The iterative methods are generally thought to be superior from a theoretical point of view but, in practice, often lead to solutions which differ little from the noniterative methods. For this reason, it is usually suggested that a non‑iterative method be used in the initial stages of the factor analysis, and that the iterative methods be used when issues such as the number of factors have been resolved.
3. Initial estimates for the unique variances are input when INIT = 1. If the iterative methods fail for these values, new initial estimates should be tried. These may be obtained by use of another factoring method (use the final estimates from the new method as initial estimates in the old method).
Another alternative is to let FACTR compute initial estimates of the unique error variances. When INIT = 0, the initial estimates are taken as a constant
divided by the diagonal elements of the
matrix. When the correlation matrix is factor analyzed, this is a constant times one minus the squared multiple correlation coefficient.
4. Workspace may be explicitly provided, if desired, by use of F2CTR/DF2CTR. The reference is:
CALL F2CTR (NVAR, COV, LDCOV, NF, IMTH, NDF, INIT, MAXIT, MAXSTP, EPS, EPSE, IPRINT, UNIQ, A, LDA, EVAL, STAT, DER, IS, COVI, WK, OLD, EVEC, HESS)
The additional arguments are as follows:
IS — Integer work vector of length equal to NVAR.
COVI — Real work vector of length equal to NVAR2.
WK — Real work vector of length equal to NVAR.
OLD — Real work vector of length equal to NVAR.
EVEC — Real work vector of length equal to NVAR2.
HESS — Real work vector of length equal to NVAR2.
5. Informational errors
Type |
Code |
Description |
3 |
1 |
Too many iterations. Convergence is assumed. |
3 |
2 |
Too many step halvings. Convergence is assumed. |
3 |
4 |
There are no degrees of freedom for the significance testing. |
Example
The following data were originally analyzed by Emmett (1949). There are 211 observations on 9 variables. Following Lawley and Maxwell (1971), three factors will be obtained by the method of maximum likelihood.
USE FACTR_INT
IMPLICIT NONE
INTEGER IMTH,IPRINT, LDA, LDCOV, MAXSTP, NDF, NF, NVAR
REAL EPS, EPSE
PARAMETER (EPS=0.000001, EPSE=0.01, IMTH=3, IPRINT=1, &
LDA=9, LDCOV=9, MAXSTP=10, NDF=210, NF=3, NVAR=9)
!
REAL A(LDA,NF), COV(LDCOV,NVAR), DER(NVAR), EVAL(NVAR), &
STAT(6), UNIQ(NVAR)
!
DATA COV/ &
1.000, 0.523, 0.395, 0.471, 0.346, 0.426, 0.576, 0.434, 0.639, &
0.523, 1.000, 0.479, 0.506, 0.418, 0.462, 0.547, 0.283, 0.645, &
0.395, 0.479, 1.000, 0.355, 0.270, 0.254, 0.452, 0.219, 0.504, &
0.471, 0.506, 0.355, 1.000, 0.691, 0.791, 0.443, 0.285, 0.505, &
0.346, 0.418, 0.270, 0.691, 1.000, 0.679, 0.383, 0.149, 0.409, &
0.426, 0.462, 0.254, 0.791, 0.679, 1.000, 0.372, 0.314, 0.472, &
0.576, 0.547, 0.452, 0.443, 0.383, 0.372, 1.000, 0.385, 0.680, &
0.434, 0.283, 0.219, 0.285, 0.149, 0.314, 0.385, 1.000, 0.470, &
0.639, 0.645, 0.504, 0.505, 0.409, 0.472, 0.680, 0.470, 1.000/
!
CALL FACTR (COV, NF, UNIQ, A, IMTH=IMTH, MAXSTP=MAXSTP, EPS=EPS,&
EPSE=EPSE, IPRINT=IPRINT, NDF=NDF, EVAL=EVAL, &
STAT=STAT, DER=DER)
END
Output
Unique Error Variances
1 2 3 4 5 6 7 8
0.4505 0.4271 0.6166 0.2123 0.3805 0.1769 0.3995 0.4615
9
0.2309
Unrotated Loadings
1 2 3
1 0.6642 -0.3209 0.0735
2 0.6888 -0.2471 -0.1933
3 0.4926 -0.3022 -0.2224
4 0.8372 0.2924 -0.0354
5 0.7050 0.3148 -0.1528
6 0.8187 0.3767 0.1045
7 0.6615 -0.3960 -0.0777
8 0.4579 -0.2955 0.4913
9 0.7657 -0.4274 -0.0117
Eigenvalues
1 2 3 4 5 6 7 8 9
0.063 0.229 0.541 0.865 0.894 0.974 1.080 1.117 1.140
STAT
1 2 3 4 5
0.0350 1.0000 7.1494 12.0000 0.8476
6
5.0000
Final Parameter Updates
1 2 3 4 5
2.02042E-07 2.95010E-07 1.80908E-07 6.38808E-08 2.00809E-07
6 7 8 9
1.48762E-07 1.73797E-08 3.95484E-07 1.42415E-07