Analyzes censored survival data using a generalized linear model.
Required Arguments
X — NOBS by NCOL matrix containing the data. (Input)
MODEL — Model option parameter. (Input) MODEL specifies the distribution of the response variable and the relationship of the linear model to a distribution parameter.
MODEL
Distribution
0
Exponential
1
Linear hazard
2
Log‑normal
3
Normal
4
Log‑logistic
5
Logistic
6
Log least extreme value
7
Least extreme value
8
Log extreme value
9
Extreme value
10
Weibull
For further discussion of the models and parameterizations used, see the “Description” section.
ILT — For interval‑censored and left‑censored observations, the column number in X that contains the upper endpoint of the failure interval. (Input) See argument ICEN. If ILT = 0, left‑censored and interval‑censored observations cannot be input.
IRT — For interval‑censored and right‑censored observations, the column number in X that contains the lower endpoint of the failure interval. (Input) For exact‑failure observations, X(i, IRT) contains the exact‑failure time. IRT must not be zero. See argument ICEN.
MAXCL — An upper bound on the sum of the number of distinct values taken by the classification variables. (Input)
NCOEF — Number of estimated coefficients in the model. (Output, if INIT = 0; Input, if INIT = 1)
COEF — NCOEF by 4 matrix containing parameter estimates and associated statistics. (Output, if INIT = 0; Input/Output, if INIT = 1; Input, if MAXIT = 0)
Col.
Statistic
1
Coefficient estimate.
2
Estimated standard deviation of the estimated coefficient.
3
Asymptotic normal score for testing that the coefficient is zero.
4
p‑value associated with the normal score in column 3.
When COEF is input, only column 1 is referenced as input data, and columns 2 to 4 need not be set. When present in the model, the initial coefficient in COEF estimates a “nuisance” parameter, and the remaining coefficients estimate parameters associated with the “linear” model, beginning with the intercept, if present. Nuisance parameters are as follows:
Model
Nuisance Parameter
1
Coefficient of the quadratic term in time, θ
2 – 9
Scale parameter, σ
10
Shape parameter, θ
ALGL — Maximized log‑likelihood. (Output)
COV — NCOEF by NCOEF matrix containing the estimated asymptotic covariance matrix of the coefficients. (Output) COV is computed as the inverse of the matrix of second partial derivatives of negative one times the log‑likelihood. When MAXIT = 0, COV is computed at the initial estimates.
XMEAN — Vector of length NCOEF containing the means of the design variables. (Output)
CASE — NOBS by 5 vector containing the case analysis. (Output)
Col.
Statistics
1
Estimated predicted value
2
Estimated influence or leverage
3
Residual estimate
4
Estimated cumulative hazard
5
For non‑censored observations, the estimated density at the observation failure time and covariate values. For censored observations, the corresponding estimated probability.
If MAXIT = 0, CASE is a NOBS by 1 vector containing the estimated probability (for censored observations) or the estimated density (for non censored observations).
GR — Vector of length NCOEF containing the last parameter updates, excluding step halvings. (Output) GR is computed as the inverse of the matrix of second partial derivatives times the vector of first partial derivatives of the log‑likelihood. When MAXIT = 0, the derivatives are computed at the initial estimates.
IADDS — Vector of length NOBS indicating which observations have and have not been included in the model. (Output, if MAXIT > 0; Input/Output, if MAXIT = 0)
Value
Status of Observation
0
Observation i has been included in the model.
1
Observation i has not been included in the model due to missing values in the X matrix.
2
Observation i has not been included in the model because of infinite estimates in extended maximum likelihood estimation. If MAXIT = 0, then the IADDS array must be initialized prior to calling SVGLM.
Optional Arguments
NOBS — Number of observations. (Input) Default: NOBS = size (X,1).
NCOL — Number of columns in X. (Input) Default: NCOL = size (X,2).
LDX — Leading dimension of X exactly as specified in the dimension statement in the calling program. (Input) Default: LDX = size (X,1).
IFRQ — Column number in X containing the frequency of response for each observation. (Input) If IFRQ = 0, a response frequency of 1 for each observation is assumed. Default: IFRQ = 0.
IFIX — Column number in X containing a constant to be added to the linear response. (Input) Default: IFIX = 0. The estimated linear response is taken to be where wi is the observation constant, zi is the observation design vector, is the vector of estimated parameters output in the first column of COEF, and i indexes the observations. The “fixed” constant allows one to test hypotheses about parameters via the log‑likelihoods. If IFIX = 0, the fixed parameter is assumed to be 0.
ICEN — Column number in X containing the censoring code for each observation. (Input) Default: ICEN = 0. If ICEN = 0, a censoring code of 0 is assumed. Valid censoring codes are:
X(i, ICEN)
Censoring
0
Exact failure at X(i, IRT).
1
Right censored. The response is greater than X(i, IRT).
2
Left censored. The response is less than or equal to X(i, ILT).
3
Interval censored. The response is greater than X(i, IRT), but less than or equal to X(i, ILT).
INFIN — Method to be used for handling infinite estimates. (Input) Default: INFIN = 0.
INFIN
Method
0
Remove a right or left‑censored observation from the loglikelihood whenever the probability of the observation exceeds 0.995. At convergence, use linear programming to check that all removed observations actually have infinite linear response
.
Set IADDS(i) for observation i to 2 if the linear response is infinite. If not all removed observations have infinite linear response
.
1
Iterate without checking for infinite estimates.
See the “Description” section for more discussion.
MAXIT — Maximum number of iterations. (Input) MAXIT = 30 will usually be sufficient. Use MAXIT = 0 to compute the Hessian and score vector at the initial estimates. Default: MAXIT = 30.
EPS — Convergence criterion. (Input) Convergence is assumed when the maximum relative change in any coefficient estimate is less than EPS from one iteration to the next, or when the relative change in the log‑likelihood, ALGL, from one iteration to the next is less than EPS/100. If EPS is negative, EPS = 0.001 is assumed. Default: EPS = 0.001.
No intercept is in the model (unless otherwise provided for by the user).
1
An intercept is automatically included in the model..
NCLVAR — Number of classification variables. (Input) Dummy or indicator variables are generated for classification variables using the IDUMMY = 2 option of routine GRGLM (see Chapter 2, “Regression”). See Comment 3. Default: NCLVAR = 0.
INDCL — Index vector of length NCLVAR containing the column numbers of X that are classification variables. (Input, if NCLVAR is positive, not used otherwise) If NCLVAR is 0, INDCL is not referenced and can be dimensioned of length 1 in the calling program.
NEF — Number of effects in the model. (Input) In addition to effects involving classification variables, simple covariates and the product of simple covariates are also considered effects. Default: NEF = 0.
NVEF — Vector of length NEF containing the number of variables associated with each effect in the model. (Input, if NEF is positive; not used otherwise) If NEF is zero, NVEF is not used and can be dimensioned of length 1 in the calling program.
INDEF — Index vector of length NVEF(1) + NVEF(2) + … + NVEF(NEF) containing the column numbers in X associated with each effect. (Input, if NEF is positive; not used otherwise) The first NVEF(1) elements of INDEF give the column numbers in X of the variables in the first effect. The next NVEF(2) elements of INDEF give the column numbers for the second effect, etc. If NEF is zero, INDEF is not used and can be dimensioned of length one in the calling program.
Unweighted linear regression is used to obtain initial estimates.
1
The NCOEF elements in the first column of COEF contain initial estimates of the parameters on input to SVGLM (requiring that the user know NCOEF prior to calling SVGLM).
Printing is performed, but observational statistics are not printed..
2
All output statistics are printed.
NCLVAL — Vector of length NCLVAR containing the number of values taken by each classification variable. (Output, if NCLVAR is positive; not used otherwise) NCLVAL(i) is the number of distinct values for the i‑th classification variable. If NCLVAR is zero, NCLVAL is not used and can be dimensioned of length 1 in the calling program.
CLVAL — Vector of length NCLVAL(1) + NCLVAL(2) + … + NCLVAL(NCLVAR) containing the distinct values of the classification variables in ascending order. (Output, if NCLVAR is positive; not used otherwise) The first NCLVAL(1) elements contain the values for the first classification variables, the next NCLVAL(2) elements contain the values for the second classification variable, etc. If NCLVAR is zero, then CLVAL is not referenced and can be dimensioned of length 1 in the calling program.
LDCOEF — Leading dimension of COEF exactly as specified in the dimension statement in the calling program. (Input) Default: LDCOEF = size (COEF,1).
LDCOV — Leading dimension of COV exactly as specified in the dimension statement in the calling program. (Input) Default: LDCOV = size (COV,1).
LDCASE — Leading dimension of CASE exactly as specified in the dimension statement in the calling program. (Input) Default: LDCASE = size (CASE,1).
NRMISS — Number of rows of data in X that contain missing values in one or more columns ILT, IRT, IFRQ, ICOEF, ICEN, INDCL or INDEF of X. (Output)
Routine SVGLM computes maximum likelihood estimates of parameters and associated statistics in generalized linear models commonly found in survival (reliability) analysis. Although the terminology used will be from the survival area, the methods discussed have application in many areas of data analysis, including reliability analysis and event history analysis. Indeed, these methods may be used anywhere a random variable from one of the discussed distributions is parameterized via one of the models available in SVGLM. Thus, while it is not advisable to do so, standard multiple linear regression may be performed by routine SVGLM. Estimates for any of ten standard models can be computed. Exact, left‑censored, right‑censored, or interval‑censored observations are allowed. (Note that left censoring is the same as interval censoring with left endpoint equal to the left endpoint of the support of the distribution.)
Let η = xTβ be the linear parameterization, where x is a design vector obtained in SVGLM via routine GRGLM (see Chapter 2, “Regression”) from a row of X, and β is a vector of parameters associated with the linear model. Let T denote the random response variable and S(t) denote the probability that T > t. All models considered also allow a fixed parameter wi for observation i (input in column IFIX of X). Use of this parameter is discussed below. There may also be nuisance parameters θ > 0, or σ > 0 to be estimated (along with β) in the various models. Let Φ denote the cumulative normal distribution. The survival models available in SVGLM are:
Model
Name
S(t)
0
Exponential
1
Linear hazard
2
Log‑normal
3
Normal
4
Log‑logistic
5
Logistic
6
Log least extreme value
7
Least extreme value
8
Log extreme value
9
Extreme value
10
Weibull
Note that the log‑least‑extreme‑value model is a reparameterization of the Weibull model. Moreover, models 0, 1, 2, 4, 6, 8, and 10 require that T > 0, while all of the remaining models allow any value for T,‑∞ < T < ∞.
Each row in the data matrix can represent a single observation, or, through the use of column IFRQ, it can represent several observations. Classification variables and their products are easily incorporated into the models via the usual GLM type specifications through the use of variables NCLVAR and INDCL, and the model variables NEF, NVEF, and INDEF.
The constant parameter wi is input in X and may be used for a number of purposes. For example, if the parameter in an exponential model is known to depend upon the size of the area tested, volume of a radioactive mass, or population density, etc., then a multiplicative factor of the exponential parameter λ = exp(xβ) may be known apriori. This factor can be input in wi (wi is the log of the factor). An alternate use of wi is as follows: It may be that λ = exp(x1β1 + x2β2), where β2 is known. Letting wi = x2β2, estimates for β1 can be obtained via SVGLM with the known fixed values for β2. Standard methods can then be used to test hypotheses about β2 via computed log‑likelihoods.
Computational details
The computations proceed as follows:
1. The input arguments are checked for consistency and validity.
2. Estimates for the means of the explanatory variables x (as generated from the model specification via GRGLM (see Chapter 2, “Regression”) are computed. Let ƒi denote the frequency of the observation. Means are computed as
3. If INIT = 0, initial estimates of the parameters for all but the exponential models (MODEL = 0, 1) are are obtained as follows:
A. Routine KAPMR is used to compute a nonparametric estimate of the survival probability at the upper limit of each failure interval. (Because upper limits are used, intervaland left‑censored data are taken to be exact failures at the upper endpoint of the failure interval.) The Kaplan‑Meier estimate is computed under the assumption that all failure distributions are identical (i.e., all β’s but the intercept, if present, are assumed to be zero).
B. If INTCEP = 0, a simple linear regression is performed predicting
where t* is computed at the upper endpoint of each failure interval, t* = t in models 3, 5, 7, and 9, and t* = ln(t) in models 2, 4, 6, 8, and 10, and wi is the fixed constant, if present. If INTCEP is zero, α is fixed at zero, and the model
is fit instead of the model above. In this model, the coefficients β are used in place of the location estimate α above. Here, is estimated from the simple linear regression with α = 0.
C. If the intercept is in the model, then in log‑location‑scale models (models 1–8),
and the initial estimate of the intercept, if present, is taken to be .
In the Weibull model,
and the intercept, if present, is taken to be .
Initial estimates of all parameters β, other than the intercept, are taken to be zero.
If no intercept is in the model, the scale parameter is estimated as above, and the estimates from Step B are used as initial estimates for the β’s.
For exponential models (MODEL = 0, 1), the average total time on test statistic is used to obtain an estimate for the intercept. Specifically, let Tt denote the total number of failures divided by the total time on test. The initial estimate for the intercept is then ln(Tt). Initial estimates for the remaining parameters β are taken as zero, and, if MODEL = 1, the initial estimate for the linear hazard parameter θ is taken to be a small positive number. When the intercept is not in the model, the initial estimate for the parameter θ is taken as a small positive number, and initial estimates of the parameters β are computed via multiple linear regression as above.
4. A quasi‑Newton algorithm is used in the initial iterations based upon a Hessian estimate
where
is the partial derivative of the i‑th term in the log‑likelihood with respect to the parameter αj, and αj denotes one of the parameters to be estimated.
When the relative change in the log‑likelihood from one iteration to the next is 0.1 or less, exact second partial derivatives are used for the Hessian so that Newton‑Raphson iteration is used.
If the initial step size results in an increase in the log‑likelihood, the full step is used. If the log‑likelihood decreases for the initial step size, the step size is halved, and a check for an increase in the log‑likelihood performed. Step‑halving is performed (as a simple line search) until an increase in the log‑likelihood is detected, or until the step size is less that 0.0001 (where the initial step size is 1).
5. Convergence is assumed when the maximum relative change in any coefficient update from one iteration to the next is less than EPS, or when the relative change in the loglikelihood from one iteration to the next is less than EPS/100. Convergence is also assumed after MAXIT iterations, or when step halving leads to a step size of less than .0001, with no increase in the log‑likelihood.
6. If requested (INFIN = 0), then the methods of Clarkson and Jennrich (1988) are used to check for the existence of infinite estimates in
As an example of a situation in which infinite estimates can occur, suppose that observation j is right censored with tj > 15 in a normal distribution model in which we fit the mean as
where xj is the observation design vector. If design vector xj for parameter βm is such that xjm = 1 and xim = 0 for all i≠j, then the optimal estimate of βm occurs at
leading to an infinite estimate of both βm and ηj. In SVGLM, such estimates may be “computed.”
In all models fit by SVGLM, infinite estimates can only occur when the optimal estimated probability associated with the left- or right‑censored observation is 1. If INFIN = 0, left‑ or right‑censored observations that have estimated probability greater than 0.995 at some point during the iterations are excluded from the log‑likelihood, and the iterations proceed with a log‑likelihood based upon the remaining observations. This allows convergence of the algorithm when the maximum relative change in the estimated coefficients is small and also allows for a more precise determination of observations with infinite
At convergence, linear programming is used to ensure that the eliminated observations have infinite ηi. If some (or all) of the removed observations should not have been removed (because their estimated ηi’s must be finite), then the iterations are restarted with a log‑likelihood based upon the finite ηi observations. See Clarkson and Jennrich (1988) for more details.
When INFIN = 1, no observations are eliminated during the iterations. In this case, when infinite estimates occur, some (or all) of the coefficient estimates will become large, and it is likely that the Hessian will become (numerically) singular prior to convergence.
7. The case statistics are computed as follows:
Let
denote the log‑likelihood of the i‑th observation evaluated at θi, denote the vector of derivatives of with respect to all parameters,
denote the derivative of with respect to η = xTβ, H denote the Hessian, and E denote expectation.
Then, the columns of CASE are:
A. Predicted values are computed as E(T∣x) according to standard formulas. If MODEL is 4 or 8, and if σ≥ 1, then the expected values cannot be computed because they are infinite.
B. Following Cook and Weisberg (1982), we take the influence (or leverage) of the i‑th observation to be
This quantity is a one‑step approximation to the change in the estimates when the i‑th observation is deleted (ignoring the nuisance parameters).
C. The “residual” is computed as
D. The cumulative hazard is computed at the observation covariate values and, for interval observations, the upper endpoint of the failure interval. The cumulative hazard can also be used as a “residual” estimate. If the model is correct, the cumulative hazards should follow a standard exponential distribution. See Cox and Oakes (1984).
E. The density (for exact failures) or the interval probability (for censored observations) is computed for given x.
Programming Notes
Classification variables are specified by parameters NCLVAR and INDCL. Indicator variables are created for the classification variables using routine GRGLM (see Chapter 2, “Regression”) with IDUMMY = 2.
Examples
Example 1
This example is from Lawless (1982, page 287) and involves the mortality of patients suffering from lung cancer. (The first ten rows of the input data are printed in the output.) An exponential distribution is fit for model
η = μ + β1x3 + β2x4 + β3x5 + αi + γk
where αi is associated with a classification variable with 4 levels, and γk is associated with a classification variable with 2 levels. Note that because the computations are performed in single precision, there will be some small variation in the estimated coefficients across different machine environments.
As a second example, the MAXIT = 0 option is used for the model in Example 1 with the coefficients restricted such that μ = ‑1.25, β1 = ‑6, and the remaining 6 coefficients are zero. A chi‑squared statistic with 8 degrees of freedom for testing that the coefficients are specified as above (versus the alternative that they are not as specified) may be computed from the output as
where is output in COV, and g is output in GR. The resulting test statistic (6.107), based upon no iterations, is comparable to the likelihood ratio test statistic that may be computed from the log‑likelihood output in Example 2 (‑206.6835) and the log‑likelihood output in Example 1 (‑204.1392).
Neither test statistic is significant at the α = 0.05 level.