Analyzes time event data via the proportional hazards model.
Required Arguments
X — NOBS by NCOL matrix containing the data. (Input) When ITIE = 1, the observations in X must be grouped by stratum and sorted from largest to smallest failure time within each stratum, with the strata separated.
IRT — Column number in X containing the response variable. (Input) For point observations, X(i, IRT) contains the time of the i‑th event. For right‑censored observations, X(i, IRT) contains the right‑censoring time. Note that because PHGLM only uses the order of the events, negative “times” are allowed.
NVEF — Vector of length NEF containing the number of variables associated with each effect in the model. (Input)
INDEF — Index vector of length NVEF(1) + … + NVEF(NEF) containing the column numbers of X associated with each effect. (Input) The first NVEF(1) elements of INDEF contain the column numbers of X for the variables in the first effect. The next NVEF(2) elements in INDEF contain the column numbers for the second effect, etc.
MAXCL — An upper bound on the sum of the number distinct values taken by the classification variables. (Input)
NCOEF — Number of estimated coefficients in the model. (Output)
COEF — NCOEF by 4 matrix containing the parameter estimates and associated statistics. (Output, if INIT = 0; Input, if INIT = 1 and MAXIT = 0; Input/Output, if INIT = 1 and MAXIT > 0)
Col.
Statistic
1
Coefficient estimate
2
Estimated standard deviation of the estimated coefficient.
3
Asymptotic normal score for testing that the coefficient is zero against the two‑sided alternative.
4
p‑value associated with the normal score in column 3.
When COEF is input, only column 1 needs to be given.
ALGL — The maximized log‑likelihood. (Output)
COV — NCOEF by NCOEF matrix containing the estimated asymptotic variance‑covariance matrix of the parameters. (Output) For MAXIT = 0, COV is the inverse of the Hessian of the negative of the log‑likelihood, computed at the estimates input in COEF.
XMEAN — Vector of length NCOEF containing the means of the design variables. (Output)
CASE — NOBS by 5 matrix containing the case statistics for each observation. (Output if MAXIT > 0; used as working storage otherwise)
Col.
Statistic
1
Estimated survival probability at the observation time.
2
Estimated observation influence or leverage.
3
A residual estimate.
4
Estimated cumulative baseline hazard rate.
5
Observation proportionality constant.
GR — Vector of length NCOEF containing the last parameter updates (excluding step halvings). (Output) For MAXIT = 0, GR contains the inverse of the Hessian times the gradient vector computed at the estimates input in COEF.
IGRP — Vector of length NOBS giving the stratum number used for each observation. (Output) If RATIO is not ‑1.0, additional “strata” (other than those specified by column ISTRAT of X) may be generated. IGRP also contains a record of the generated strata. See the “Description” section for more detail.
Optional Arguments
NOBS — Number of observations. (Input) Default: NOBS = size (X,1).
NCOL — Number of columns in X. (Input) Default: NCOL = size (X,2).
LDX — Leading dimension of X exactly as specified in the dimension statement in the calling program. (Input) Default: LDX = size (X,1).
IFRQ — Column number in X containing the frequency of response for each observation. (Input) If IFRQ = 0, a response frequency of 1 for each observation is assumed. Default: IFRQ = 0.
IFIX — Column number in X containing a constant to be added to the linear response. (Input) Default: IFIX = 0. The linear response is taken to be , where wi is the observation constant, zi is the observation design row vector, and is the vector of estimated parameters. The “fixed” constant allows one to test hypotheses about parameters via the log-likelihoods. If IFIX = 0, the fixed parameter is assumed to be 0.
ICEN — Column number in X containing the censoring code for each observation. (Input) Default: ICEN = 0.
If ICEN = 0 a censoring code of 0 is assumed for all observations.
X(i, ICEN)
Censoring
0
Point observation at X(i, IRT).
1
Right censored. The response is greater than X(i, IRT).
ISTRAT — Column number in X containing the stratification variable. (Input) If ISTRAT = 0, all observations are considered to be in one stratum. Otherwise, column ISTRAT in X contains a unique number for each stratum. The risk set for an observation is determined by the its stratum. Default: ISTRAT = 0.
MAXIT — Maximum number of iterations. (Input) MAXIT = 30 will usually be sufficient. Use MAXIT = 0 to compute the Hessian and gradient, stored in COV and GR, at the initial estimates. When MAXIT = 0, INIT must be 1. Default: MAXIT = 30.
EPS — Convergence criterion. (Input) Convergence is assumed when the relative change in ALGL from one iteration to the next is less than EPS. If EPS is zero, EPS = 0.0001 is assumed. Default: EPS = 0.0001.
RATIO — Ratio at which a stratum is split into two strata. (Input) Default: RATIO = 1000.0. Let
be the observation proportionality constant, where zk is the design row vector for the k‑th observation and wk is the optional fixed parameter specified by X(k, IFIX). Let rmin be the minimum value rk in a stratum, where, for failed observations, the minimum is over all times less than or equal to the time of occurrence of the k‑th observation. Let rmax be the maximum value of rk for the remaining observations in the group. Then, if rmin > RATIOrmax, the observations in the group are divided into two groups at k. RATIO = 1000 is usually a good value. Set RATIO = ‑1.0 if no division into strata is to be made.
NCLVAR — Number of classification variables. (Input) Dummy variables are generated for classification variables using the IDUMMY = 2 option of IMSL routine GRGLM (see Chapter 2, “Regression”). See also Comment 3. Default: NCLVAR = 0.
INDCL — Index vector of length NCLVAR containing the column numbers of X that are the classification variables. (Input, if NCLVAR is positive, not used otherwise) If NCLVAR is 0, INDCL is not referenced and can be dimensioned of length 1 in the calling program.
NEF — Number of effects in the model. (Input) In addition to effects involving classification variables, simple covariates and the product of simple covariates are also considered effects. Default: NEF = size(NVEF,1).
INIT — Initialization option. (Input) If INIT = 1, then the NCOEF elements of column 1 of COEF contain the initial estimates on input to PHGLM. For INIT = 0, all initial estimates are taken to be 0. Default: INIT = 0.
ITIE — Option parameter containing the method to be used for handling ties. (Input) Default: ITIE = 0.
ITIE
Method
0
Breslow’s approximate method
1
Failures are assumed to occur in the same order as the observations input in X. The observations in X must be sorted from largest to smallest failure time within each stratum, and grouped by stratum. All observations are treated as if their failure/censoring times were distinct when computing the log‑likelihood.
Printing is performed, but observational statistics are not printed.
2
All output statistics are printed.
NCLVAL — Vector of length NCLVAR containing the number of values taken by each classification variable. (Output, if NCLVAR is positive, not used otherwise) NCLVAL(i) is the number of distinct values for the i‑th classification variable. If NCLVAR is zero, NCLVAL is not used and can be dimensioned of length 1 in the calling program.
CLVAL — Vector of length NCLVAL(1) + NCLVAL(2) + … + NCLVAL(NCLVAR) containing the distinct values of the classification variables. (Output, if NCLVAR is positive, not used otherwise) The first NCLVAL(1) elements of CLVAL contain the values for the first classification variable, the next NCLVAL(2) elements contain the values for the second classification variable, etc. If NCLVAR is zero, then NCLVAL is not referenced and can be dimensioned of length 1 in the calling program.
LDCOEF — Leading dimension of COEF exactly as specified in the dimension statement in the calling program. (Input) Default: LDCOEF = size (COEF,1).
LDCOV — Leading dimension of COV exactly as specified in the dimension statement in the calling program. (Input) Default: LDCOV = size (COV,1).
LDCASE — Leading dimension of CASE exactly as specified in the dimension statement in the calling program. (Input) Default: LDCASE = size (CASE,1).
NRMISS — Number of rows of data in X that contain missing values in one or more columns IRT, IFRQ, IFIX, ICEN, ISTRAT, INDCL, or INDEF of X. (Output)
Routine PHGLM computes parameter estimates and other statistics in Proportional Hazards Generalized Linear Models. These models were first proposed by Cox (1972). Two methods for handling ties are allowed in PHGLM. Time‑dependent covariates are not allowed. The user is referred to Cox and Oakes (1984), Kalbfleisch and Prentice (1980), Elandt‑Johnson and Johnson (1980), Lee (1980), or Lawless (1982), among other texts, for a thorough discussion of the Cox proportional hazards model.
Let λ(t, zi) represent the hazard rate at time t for observation number i with covariables contained as elements of row vector zi. The basic assumption in the proportional hazards model (the proportionality assumption) is that the hazard rate can be written as a product of a time varying function λ0(t), which depends only on time, and a function ƒ(zi), which depends only on the covariable values. The function ƒ(zi) used in PHGLM is given as ƒ(zi) = exp(wi + βzi) where wi is a fixed constant assigned to the observation, and β is a vector of coefficients to be estimated. With this function one obtains a hazard rate λ(t, zi) = λ0(t) exp(wi + βzi). The form of λ0(t) is not important in proportional hazards models.
The constants wi may be known theoretically. For example, the hazard rate may be proportional to a known length or area, and the wi can then be determined from this known length or area. Alternatively, the wi may be used to fix a subset of the coefficients β (say, β1) at specified values. When wi is used in this way, constants wi = β1z1i are used, while the remaining coefficients in β are free to vary in the optimization algorithm. If user‑specified constants are not desired, the user should set IFIX to 0 so that wi = 0 will be used.
With this definition of λ(t, zi), the usual partial (or marginal, see Kalbfleisch and Prentice (1980)) likelihood becomes
where R(ti) denotes the set of indices of observations that have not yet failed at time ti (the risk set), ti denotes the time of failure for the i‑th observation, nd is the total number of observations that fail. Right‑censored observations (i.e., observations that are known to have survived to time ti, but for which no time of failure is known) are incorporated into the likelihood through the risk set R(ti). Such observations never appear in the numerator of the likelihood. When ITIE = 0, all observations that are censored at time ti are not included in R(ti), while all observations that fail at time ti are included in R(ti).
If it can be assumed that the dependence of the hazard rate upon the covariate values remains the same from stratum to stratum, while the time‑dependent term, λ0(t), may be different in different strata, then PHGLM allows the incorporation of strata into the likelihood as follows. Let k index the m = NSTRAT strata. Then, the likelihood is given by
In PHGLM, the log of the likelihood is maximized with respect to the coefficients β. A quasi‑Newton algorithm approximating the Hessian via the matrix of sums of squares and cross products of the first partial derivatives is used in the initial iterations (the “Q‑N” method in the output). When the change in the log‑likelihood from one iteration to the next is less than 100*EPS, Newton‑Raphson iteration is used (the “N‑R” method). If, during any iteration, the initial step does not lead to an increase in the log‑likelihood, then step halving is employed to find a step that will increase the log‑likelihood.
Once the maximum likelihood estimates have been computed, PHGLM computes estimates of a probability associated with each failure. Within stratum k, an estimate of the probability that the i‑th observation fails at time ti given the risk set R(tki) is given by
A diagnostic “influence” or “leverage” statistic is computed for each noncensored observation as:
where Hs is the matrix of second partial derivatives of the log‑likelihood, and
is computed as:
Influence statistics are not computed for censored observations.
A “residual” is computed for each of the input observations according to methods given in Cox and Oakes (1984, page 108). Residuals are computed as
where dkj is the number of tied failures in group k at time tkj. Assuming that the proportional hazards assumption holds, the residuals should approximate a random sample (with censoring) from the unit exponential distribution. By subtracting the expected values, centered residuals can be obtained. (The j‑th expected order statistic from the unit exponential with censoring is given as
where h is the sample size, and censored observations are not included in the summation.)
An estimate of the cumulative baseline hazard within group k is given as
The observation proportionality constant is computed as
Comments
1. Workspace may be explicitly provided, if desired, by use of P2GLM/DP2GLM. The reference is:
Too many iterations required. Convergence assumed.
3
2
Too many step halvings. Convergence assumed.
3
3
Additional strata were formed as required because of the detection of infinite parameter estimates.
4
4
The number of distinct values of the classification variables exceeds MAXCL.
4
5
The model specified by NEF, NVEF, and INDEF yields no covariates.
4
6
After eliminating observations with missing values, no valid observations remain.
4
7
After eliminating observations with missing values, only one covariate vector remains.
4
8
The number of distinct values for each classification variable must be greater than one.
4
9
LDCOEF or LDCOV must be greater or equal to NCOEF.
3. Dummy variables are generated for the classification variables as follows: An ascending list of all distinct values of the classification variable is obtained and stored in CLVAL. Dummy variables are then generated for each but the last of these distinct values. Each dummy variable is zero unless the classification variable equals the list value corresponding to the dummy variable, in which case, the dummy variable is one. See argument IDUMMY for IDUMMY = 2 in routine GRGLM in Chapter 2, “Regression”.
4. The “product” of a classification variable with a covariate yields dummy variables equal to the product of the covariate with each of the dummy variables associated with the classification variable.
5. The “product” of two classification variables yields dummy variables in the usual manner. Each dummy variable associated with the first classification variable multiplies each dummy variable associated with the second classification variable. The resulting dummy variables are such that the index of the second classification variable varies fastest.
Programming Notes
1. The covariate vectors zki are computed from each row of the input matrix X via routine GRGLM in Chapter 2, “Regression”.). Thus, class variables are easily incorporated into the zki. The reader is referred to the document for GRGLM in the regression chapter for a more detailed discussion. Note that PHGLM calls GRGLM with the option IDUMMY = 2.
2. The average of each of the explanatory variables is subtracted from the variable prior to computing the product zkiβ. Subtraction of the mean values has no effect on the computed log‑likelihood or the estimates since the constant term occurs in both the numerator and denominator of the likelihood. Subtracting the mean values does help to avoid invalid exponentiation in the algorithm and may also speed convergence.
3. Routine PHGLM allows for two methods of handling ties. In the first method (ITIE = 1), the user is allowed to break ties in any manner desired. When this method is used, it is assumed that the user has sorted the rows in X from largest to smallest with respect to the failure/censoring times X(i, IRT) within each stratum (and across strata), with tied observations (failures or censored) broken in the manner desired. The same effect can be obtained with ITIE = 0 by adding (or subtracting) a small amount from each of the tied observations failure/censoring times ti = X(i, IRT) so as to break the ties in the desired manner.
The second method for handling ties (ITIE = 0) uses an approximation for the tied likelihood proposed by Breslow (1974). The likelihood in Breslow’s method is as specified above, with the risk set at time ti, including all observations that fail at time ti, while all observations that are censored at time ti are not included. (Tied censored observations are assumed to be censored immediately prior to the time ti).
4. If INIT = 1, then it is assumed that the user has provided initial estimates for the model coefficients β in the first column of the matrix COEF. When initial estimates are provided by the user, care should be taken to ensure that the estimates correspond to the generated covariate vector zki. If INIT = 0, then initial estimates of zero are used for all of the coefficients. This corresponds to no effect from any of the covariate values.
5. If a linear combination of covariates is monotonically increasing or decreasing with increasing failure times, then one or more of the estimated coefficients is infinite and extended maximum likelihood estimates must be computed. Such estimates may be written as
where ρ = ∞ at the supremum of the likelihood so that
is the finite part of the solution. In PHGLM, it is assumed that extended maximum likelihood estimates must be computed if, within any group k, for any time t,
where ρ = RATIO is specified by the user. Thus, for example, if ρ = 10000, then PHGLM does not compute extended maximum likelihood estimates until the estimated proportionality constant
is 10000 times larger for all observations prior to t than for all observations after t. When this occurs, PHGLM computes estimates for
by splitting the failures in stratum k into two strata at t (see Bryson and Johnson 1981). Censored observations in stratum k are placed into a stratum based upon the associated value for
The results of the splitting are returned in IGRP.
The estimates
based upon the stratified likelihood represent the finite part of the extended maximum likelihood solution. Routine PHGLM does not compute explicitly, but an estimate for may be obtained in some circumstances by setting RATIO = ‑1 and optimizing the log‑likelihood without forming additional strata. The solution
obtained will be such that
for some finite value of ρ > 0. At this solution, the Newton‑Raphson algorithm will not have “converged” because the Newton‑Raphson step sizes returned in GR will be large, at least for some variables. Convergence will be declared, however, because the relative change in the log‑likelihood during the final iterations will be small.
Examples
Example 1
The following data are taken from Lawless (1982, page 287) and involve the survival of lung cancer patients based upon their initial tumor types and treatment type. In the first example, the likelihood is maximized with no strata present in the data. This corresponds to Example 7.2.3 in Lawless (1982, page 367). The input data is printed in the output. The model is given as:
where αi and γj correspond to dummy variables generated from columns 6 and 7 of X, respectively, x1 corresponds to column 3 of X, x2 corresponds to column 4 of X, and x3 corresponds to column 5 of X.
This example illustrates the use of PHGLM when there are strata present in the data. The observations from Example 1 are arbitrarily grouped into four strata (the first ten observations form stratum 1, the next 10 for stratum 2, etc.). Otherwise, the problem is unchanged. The resulting coefficients are very similar to those obtained when there is no stratification variable. The model is the same as in Example 1.