propHazardsGenLin¶
Analyzes survival and reliability data using Cox’s proportional hazards model.
Synopsis¶
propHazardsGenLin (x, nVarEffects, indicesEffects, maxClass, ncoef)
Required Arguments¶
- float
x[[]]
(Input) - Array of length
nObservations
×nColumns
containing the data. When optional argumentitie
= 1, the observations inx
must be grouped by stratum and sorted from largest to smallest failure time within each stratum, with the strata separated. - int
nVarEffects[]
(Input) - Array of length
nef
containing the number of variables associated with each effect in the model. - int
indicesEffects[]
(Input) - Index array of length
nVarEffects[0]
+ … +nVarEffects[nef-1]
containing the column indices ofx
associated with each effect. The firstnVarEffects[0]
elements ofindicesEffects
contain the column indices ofx
for the variables in the first effect. The nextnVarEffects[1]
elements inindicesEffects
contain the column indices for the second effect, etc. - int
maxClass
(Input) - An upper bound on the total number of different values found among the
classification variables in
x
. For example, if the model consisted of two class variables, one with the values {1, 2, 3, 4} and a second with the values {0, 1}, then the total number of different classification values is \(4+2=6\), andmaxClass
>= 6. - int
ncoef
(Output) - Number of estimated coefficients in the model.
Return Value¶
An array of length ncoef
×4, coef
, containing the parameter
estimates and associated statistics.
Column |
Statistic |
---|---|
1 |
Coefficient estimate \(\hat{\beta}\) |
2 |
Estimated standard deviation of the estimated coefficient. |
3 |
Asymptotic normal score for testing that the coefficient is zero against the two-sided alternative. |
4 |
p-value associated with the normal score in column 3. |
Optional Arguments¶
printLevel
, int (Input)Printing option.
printLevel
Action 0 No printing is performed. 1 Printing is performed, but observational statistics are not printed. 2 All output statistics are printed. Default:
printLevel
= 0.maxIterations
, int (Input)Maximum number of iterations.
maxIterations
= 30 will usually be sufficient. UsemaxIterations
= 0 to compute the Hessian and gradient, stored incov
andgr
, at the initial estimates. WhenmaxIterations
= 0,initialEstInput
must be used.Default:
maxIterations
= 30.convergenceEps
, float (Input)Convergence criterion. Convergence is assumed when the relative change in
algl
from one iteration to the next is less thanconvergenceEps
. IfconvergenceEps
is zero,convergenceEps
= 0.0001 is assumed.Default:
convergenceEps
= 0.0001.ratio
, float (Input)Ratio at which a stratum is split into two strata. Let
\[r_k = \exp\left(z_k \hat{\beta} + w_k\right)\]be the observation proportionality constant, where \(z_k\) is the design row vector for the k-th observation and \(w_k\) is the optional fixed parameter specified by \(\text{x}_{k,xColFixedParameter}\). Let \(r_{min}\) be the minimum value \(r_k\) in a stratum, where, for failed observations, the minimum is over all times less than or equal to the time of occurrence of the k-th observation. Let \(r_{max}\) be the maximum value of \(r_k\) for the remaining observations in the group. Then, if \(r_{min}\) >
ratio
\(r_{max}\), the observations in the group are divided into two groups at k.ratio
= 1000 is usually a good value. Setratio
= -1.0 if no division into strata is to be made.Default:
ratio
= 1000.0.xResponseCol
, int (Input)Column index in
x
containing the response variable. For point observations, \(\text{x}_{i,xResponseCol}\) contains the time of the i-th event. For right-censored observations, \(\text{x}_{i,xResponseCol}\) contains the right-censoring time. Note that becausepropHazardsGenLin
only uses the order of the events, negative “times” are allowed.Default:
xResponseCol
= 0.censorCodesCol
, int (Input)- Column index in
x
containing the censoring code for each observation. Default: A censoring code of 0 is assumed for all observations.
\(\text{x}_{i,censorCodesCol}\) | |
---|---|
0 | Exact censoring time \(\text{x}_{i,xResponseCol}\). |
1 | Right censored. The exact censoring time is greater than \(\text{x}_{i,xResponseCol}\). |
stratificationCol
, int (Input)Column number in
x
containing the stratification variable. ColumnstratificationCol
inx
contains a unique number for each stratum. The risk set for an observation is determined by its stratum.Default: All observations are considered to be in one stratum.
constantCol
, int (Input)Column index in
x
containing a constant, \(w_i\), to be added to the linear response. The linear response is taken to be \(w_i+z_i \hat{\beta}\) where \(w_i\) is the observation constant, \(z_i\) is the observation design row vector, and \(\hat{\beta}\) is the vector of estimated parameters. The “fixed” constant allows one to test hypotheses about parameters via the log-likelihoods.Default: \(w_i\) is assumed to be 0 for all observations.
freqResponseCol
, int (Input)Column index in
x
containing the number of responses for each observation.Default: A response frequency of 1 for each observation is assumed.
tiesOption
, int (Input)- Method for handling ties.
tiesOption |
|
---|---|
0 | Breslow’s approximate method. |
1 | Failures are assumed to occur in the same order as
the observations input in x . The observations in
x must be sorted from largest to smallest
failure time within each stratum, and grouped by
stratum. All observations are treated as if their
failure/censoring times were distinct when computing
the log-likelihood. |
Default: tiesOption
= 0.
maximumLikelihood
(Output)- The maximized log-likelihood.
nMissing
(Output)- Number of rows of data in
X
that contain missing values in one or more columnsxResponseCol
,freqResponseCol
,ifix
,censorCodesCol
,stratificationCol
,indexClassVar
, orindicesEffects
ofx
. statistics
(Output)- An array of length
nObservations
×5 containing the case statistics for each observation.
Column |
Statistic |
---|---|
1 | Estimated survival probability at the observation time. |
2 | Estimated observation influence or leverage. |
3 | A residual estimate. |
4 | Estimated cumulative baseline hazard rate. |
5 | Observation proportionality constant. |
xMean
(Output)- An array of length
ncoef
containing the means of the design variables. varianceCovarianceMatrix
, floatcov
(Output)- An array of length
ncoef
*ncoef
containing the estimated asymptotic variance-covariance matrix of the parameters. FormaxIterations
= 0, the return value is the inverse of the Hessian of the negative of the log-likelihood, computed at the estimates input ininitialEstInput
. initialEstInput
, float (Input)An array of length
ncoef
containing the initial estimates on input topropHazardsGenLin
.Default: all initial estimates are taken to be 0.
update
(Output)- An array of length
ncoef
containing the last parameter updates (excluding step halvings). FormaxIterations
= 0,update
contains the inverse of the Hessian times the gradient vector computed at the estimates input ininitialEstInput
. dummy
(Input)Argument
dummy
is an index array of lengthnClassVar
containing the column numbers ofx
that are the classification variables. (IfnClassVar
is equal to zero,indexClassVar
is not used).nClassVar
is the number of classification variables. Dummy variables are generated for classification variables using thedummy
=leaveOutLast
of thedummy
option of regressorsForGlm function (see Regression).Default:
nClassVar
= 0.stratumNumber
(Output)- An array of length
nObservations
giving the stratum number used for each observation. Ifratio
is not -1.0, additional “strata
” (other than those specified by columnstratificationCol
ofx
) may be generated.stratumNumber
also contains a record of the generated strata. See the Description section for more detail. classVariables
,nClassValues
,classValues
(Output)nClassValues
is an array of lengthnClassVar
containing the number of values taken by each classification variable.nClassValues[
i]
is the number of distinct values for the i-th classification variable.classValues
is an array of lengthnClassValues[0]
+nClassValues[1]
+ … +nClassValues[nClassVar‑1]
containing the distinct values of the classification variables. The firstnClassValues[0]
elements ofclassValues
contain the values for the first classification variable, the nextnClassValues[1]
elements contain the values for the second classification variable, etc.
Description¶
Function propHazardsGenLin
computes parameter estimates and other
statistics in Proportional Hazards Generalized Linear Models. These models
were first proposed by Cox (1972). Two methods for handling ties are allowed
in propHazardsGenLin
. Time-dependent covariates are not allowed. The
user is referred to Cox and Oakes (1984), Kalbfleisch and Prentice (1980),
Elandt-Johnson and Johnson (1980), Lee (1980), or Lawless (1982), among
other texts, for a thorough discussion of the Cox proportional hazards
model.
Let \(\lambda(t,z_i)\) represent the hazard rate at time t for
observation number i with covariables contained as elements of row vector
\(z_i\). The basic assumption in the proportional hazards model (the
proportionality assumption) is that the hazard rate can be written as a
product of a time varying function \(\lambda_0(t)\), which depends only
on time, and a function \(f(z_i)\), which depends only on the covariable
values. The function \(f(z_i)\) used in propHazardsGenLin
is given as
\(f(z_i)= \exp(w_i+\beta z_i)\) where \(w_i\) is a fixed constant
assigned to the observation, and β is a vector of coefficients to be
estimated. With this function one obtains a hazard rate
\(\lambda(t,z_i)=\lambda_0(t) \exp(w_i+\beta z_i)\). The form of
\(\lambda_0(t)\) is not important in proportional hazards models.
The constants \(w_i\) may be known theoretically. For example, the hazard
rate may be proportional to a known length or area, and the \(w_i\) can
then be determined from this known length or area. Alternatively, the
\(w_i\) may be used to fix a subset of the coefficients β (say,
\(\beta_1\)) at specified values. When \(w_i\) is used in this way,
constants \(w_i=\beta_1 z_{1i}\) are used, while the remaining
coefficients in β are free to vary in the optimization algorithm. If
user-specified constants are not desired, the user should set ifix
to 0
so that \(w_i=0\) will be used.
With this definition of \(\lambda(t,z_i)\), the usual partial (or marginal, see Kalbfleisch and Prentice (1980)) likelihood becomes
where \(R(t_i)\) denotes the set of indices of observations that have not
yet failed at time \(t_i\) (the risk set), \(t_i\) denotes the time
of failure for the i-th observation, \(n_d\) is the total number of
observations that fail. Right-censored observations (i.e., observations that
are known to have survived to time \(t_i\), but for which no time of
failure is known) are incorporated into the likelihood through the risk set
\(R(t_i)\). Such observations never appear in the numerator of the
likelihood. When tiesOption
= 0, all observations that are censored at
time \(t_i\) are not included in \(R(t_i)\), while all observations
that fail at time \(t_i\) are included in \(R(t_i)\).
If it can be assumed that the dependence of the hazard rate upon the
covariate values remains the same from stratum to stratum, while the
time-dependent term, \(\lambda_0(t)\), may be different in different
strata, then propHazardsGenLin
allows the incorporation of strata into
the likelihood as follows. Let k index the m = stratificationCol
strata. Then, the likelihood is given by
In propHazardsGenLin
, the log of the likelihood is maximized with
respect to the coefficients β. A quasi-Newton algorithm approximating the
Hessian via the matrix of sums of squares and cross products of the first
partial derivatives is used in the initial iterations (the “Q-N” method in
the output). When the change in the log-likelihood from one iteration to the
next is less than 100*convergenceEps
, Newton-Raphson iteration is
used (the “N-R” method). If, during any iteration, the initial step does not
lead to an increase in the log-likelihood, then step halving is employed to
find a step that will increase the log-likelihood.
Once the maximum likelihood estimates have been computed,
propHazardsGenLin
computes estimates of a probability associated with
each failure. Within stratum k, an estimate of the probability that the
i-th observation fails at time \(t_i\) given the risk set
\(R(t_{ki})\) is given by
A diagnostic “influence” or “leverage” statistic is computed for each noncensored observation as:
where \(H_s\) is the matrix of second partial derivatives of the log-likelihood, and
is computed as:
Influence statistics are not computed for censored observations.
A “residual” is computed for each of the input observations according to methods given in Cox and Oakes (1984, page 108). Residuals are computed as
where \(d_{kj}\) is the number of tied failures in group k at time \(t_{kj}\). Assuming that the proportional hazards assumption holds, the residuals should approximate a random sample (with censoring) from the unit exponential distribution. By subtracting the expected values, centered residuals can be obtained. (The j-th expected order statistic from the unit exponential with censoring is given as
where h is the sample size, and censored observations are not included in the summation.)
An estimate of the cumulative baseline hazard within group k is given as
The observation proportionality constant is computed as
Programming Notes¶
The covariate vectors \(z_{ki}\) are computed from each row of the input matrix
x
via function regressorsForGlm (see Chapter 2, Regression). Thus, class variables are easily incorporated into the \(z_{ki}\). The reader is referred to the document forregressorsForGlm
in the regression chapter for a more detailed discussion.Note that
propHazardsGenLin
callsregressorsForGlm
withdummy
=leaveOutLast
of thedummy
option.The average of each of the explanatory variables is subtracted from the variable prior to computing the product \(z_{ki} \beta\). Subtraction of the mean values has no effect on the computed log-likelihood or the estimates since the constant term occurs in both the numerator and denominator of the likelihood. Subtracting the mean values does help to avoid invalid exponentiation in the algorithm and may also speed convergence.
Function
propHazardsGenLin
allows for two methods of handling ties. In the first method (tiesOption
= 1), the user is allowed to break ties in any manner desired. When this method is used, it is assumed that the user has sorted the rows inX
from largest to smallest with respect to the failure/censoring times \(\text{x}_{i,xResponseCol}\) within each stratum (and across strata), with tied observations (failures or censored) broken in the manner desired. The same effect can be obtained withtiesOption
= 0 by adding (or subtracting) a small amount from each of the tied observations failure/ censoring times \(t_i= \text{x}_{i,xResponseCol}\) so as to break the ties in the desired manner.The second method for handling ties (
tiesOption
= 0) uses an approximation for the tied likelihood proposed by Breslow (1974). The likelihood in Breslow’s method is as specified above, with the risk set at time \(t_i\) including all observations that fail at time \(t_i\), while all observations that are censored at time \(t_i\) are not included.(Tied censored observations are assumed to be censored immediately prior to the time \(t_i\)).
If
initialEstInput
option is used, then it is assumed that the user has provided initial estimates for the model coefficients β ininitialEstInput
. When initial estimates are provided by the user, care should be taken to ensure that the estimates correspond to the generated covariate vector \(z_{ki}\). IfinitialEstInput
option is not used, then initial estimates of zero are used for all of the coefficients. This corresponds to no effect from any of the covariate values.If a linear combination of covariates is monotonically increasing or decreasing with increasing failure times, then one or more of the estimated coefficients is infinite and extended maximum likelihood estimates must be computed. Such estimates may be written as \(\hat{\beta}=\hat{\beta}_f+\rho\hat{\gamma}\) where \(\rho= \infty\) at the supremum of the likelihood so that \(\hat{\beta}_f\) is the finite part of the solution. In
propHazardsGenLin
, it is assumed that extended maximum likelihood estimates must be computed if, within any group k, for any time t,\[\min_{t_{ki} < t} \exp \left(w_{ki} + z_{ki} \hat{\beta}\right) > \rho \max_{t_{ki} < t} \exp \left(w_{ki} + z_{ki} \hat{\beta}\right)\]where ρ =
ratio
is specified by the user. Thus, for example, if \(\rho=10000\), thenpropHazardsGenLin
does not compute extended maximum likelihood estimates until the estimated proportionality constant\[\exp \left(w_{ki} + z_{ki} \hat{\beta}\right)\]is 10000 times larger for all observations prior to t than for all observations after t. When this occurs,
propHazardsGenLin
computes estimates for \(\hat{\beta}_f\) by splitting the failures in stratum k into two strata at t (see Bryson and Johnson 1981). Censored observations in stratum k are placed into a stratum based upon the associated value for\[\exp \left(w_{ki} + z_{ki} \hat{\beta}\right)\]The results of the splitting are returned in
stratumNumberCol
.The estimates \(\hat{\beta}_f\) based upon the stratified likelihood represent the finite part of the extended maximum likelihood solution. Function
propHazardsGenLin
does not compute \(\hat{y}\) explicitly, but an estimate for \(\hat{y}\) may be obtained in some circumstances by settingratio
= -1 and optimizing the log‑likelihood without forming additional strata. The solution \(\hat{\beta}\) obtained will be such that \(\hat{\beta}=\hat{\beta}_f+\rho\hat{\gamma}\) for some finite value of \(\rho>0\). At this solution, the Newton-Raphson algorithm will not have “converged” because the Newton-Raphson step sizes returned ingr
will be large, at least for some variables. Convergence will be declared, however, because the relative change in the log‑likelihood during the final iterations will be small.
Example¶
The following data are taken from Lawless (1982, page 287) and involve the survival of lung cancer patients based upon their initial tumor types and treatment type. In the first example, the likelihood is maximized with no strata present in the data. This corresponds to Example 7.2.3 in Lawless (1982, page 367). The input data is printed in the output. The model is given as:
where \(\alpha_i\) and \(\gamma_j\) correspond to dummy variables
generated from column indices 5 and 6 of x
, respectively, \(x_1\)
corresponds to column index 2, \(x_2\) corresponds to column index 3, and
\(x_3\) corresponds to column index 4 of x
.
from numpy import *
from pyimsl.stat.propHazardsGenLin import propHazardsGenLin
icen = 1
iprint = 2
maxcl = 6
indef = [2, 3, 4, 5, 6]
nvef = [1, 1, 1, 1, 1]
indcl = [5, 6]
ratio = 10000.0
x = array([
[411, 0, 7, 64, 5, 1, 0],
[126, 0, 6, 63, 9, 1, 0],
[118, 0, 7, 65, 11, 1, 0],
[92, 0, 4, 69, 10, 1, 0],
[8, 0, 4, 63, 58, 1, 0],
[25, 1, 7, 48, 9, 1, 0],
[11, 0, 7, 48, 11, 1, 0],
[54, 0, 8, 63, 4, 2, 0],
[153, 0, 6, 63, 14, 2, 0],
[16, 0, 3, 53, 4, 2, 0],
[56, 0, 8, 43, 12, 2, 0],
[21, 0, 4, 55, 2, 2, 0],
[287, 0, 6, 66, 25, 2, 0],
[10, 0, 4, 67, 23, 2, 0],
[8, 0, 2, 61, 19, 3, 0],
[12, 0, 5, 63, 4, 3, 0],
[177, 0, 5, 66, 16, 4, 0],
[12, 0, 4, 68, 12, 4, 0],
[200, 0, 8, 41, 12, 4, 0],
[250, 0, 7, 53, 8, 4, 0],
[100, 0, 6, 37, 13, 4, 0],
[999, 0, 9, 54, 12, 1, 1],
[231, 1, 5, 52, 8, 1, 1],
[991, 0, 7, 50, 7, 1, 1],
[1, 0, 2, 65, 21, 1, 1],
[201, 0, 8, 52, 28, 1, 1],
[44, 0, 6, 70, 13, 1, 1],
[15, 0, 5, 40, 13, 1, 1],
[103, 1, 7, 36, 22, 2, 1],
[2, 0, 4, 44, 36, 2, 1],
[20, 0, 3, 54, 9, 2, 1],
[51, 0, 3, 59, 87, 2, 1],
[18, 0, 4, 69, 5, 3, 1],
[90, 0, 6, 50, 22, 3, 1],
[84, 0, 8, 62, 4, 3, 1],
[164, 0, 7, 68, 15, 4, 1],
[19, 0, 3, 39, 4, 4, 1],
[43, 0, 6, 49, 11, 4, 1],
[340, 0, 8, 64, 10, 4, 1],
[231, 0, 7, 67, 18, 4, 1]])
ncoef = []
coef = propHazardsGenLin(x, nvef, indef, maxcl, ncoef,
printLevel=2,
censorCodesCol=icen,
ratio=ratio, dummy=indcl)
Output¶
Initial Estimates
1 2 3 4 5 6 7
0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000
Method Iteration Step size Maximum scaled Log
coef. update likelihood
Q-N 0 -102.4
Q-N 1 1.0000 0.5034 -91.04
Q-N 2 1.0000 0.5782 -88.07
N-R 3 1.0000 0.1131 -87.92
N-R 4 1.0000 0.06958 -87.89
N-R 5 1.0000 0.0008149 -87.89
Log-likelihood -87.88778
Coefficient Statistics
Coefficient Standard Asymptotic Asymptotic
error z-statistic p-value
1 -0.585 0.137 -4.272 0.000
2 -0.013 0.021 -0.634 0.526
3 0.001 0.012 0.064 0.949
4 -0.367 0.485 -0.757 0.449
5 -0.008 0.507 -0.015 0.988
6 1.113 0.633 1.758 0.079
7 0.380 0.406 0.936 0.349
Asymptotic Coefficient Covariance
1 2 3 4 5
1 0.01873 0.000253 0.0003345 0.005745 0.00975
2 0.0004235 -4.12e-05 -0.001663 -0.0007954
3 0.0001397 0.0008111 -0.001831
4 0.235 0.09799
5 0.2568
6 7
1 0.004264 0.002082
2 -0.003079 -0.002898
3 0.0005995 0.001684
4 0.1184 0.03735
5 0.1253 -0.01944
6 0.4008 0.06289
7 0.1647
Case Analysis
Survival Influence Residual Cumulative Prop.
Probability hazard constant
1 0.00 0.04 2.05 6.10 0.34
2 0.30 0.11 0.74 1.21 0.61
3 0.34 0.12 0.36 1.07 0.33
4 0.43 0.16 1.53 0.84 1.83
5 0.96 0.56 0.09 0.05 2.05
6 0.74 ............ 0.13 0.31 0.42
7 0.92 0.37 0.03 0.08 0.42
8 0.59 0.26 0.14 0.53 0.27
9 0.26 0.12 1.20 1.36 0.88
10 0.85 0.15 0.97 0.17 5.76
11 0.55 0.31 0.21 0.60 0.36
12 0.74 0.21 0.96 0.31 3.12
13 0.03 0.06 3.02 3.53 0.86
14 0.94 0.09 0.17 0.06 2.71
15 0.96 0.16 1.31 0.05 28.89
16 0.89 0.23 0.59 0.12 4.82
17 0.18 0.09 2.62 1.71 1.54
18 0.89 0.19 0.33 0.12 2.68
19 0.14 0.23 0.72 1.96 0.37
20 0.05 0.09 1.66 2.95 0.56
21 0.39 0.22 1.17 0.94 1.25
22 0.00 0.00 1.73 21.10 0.08
23 0.08 ............ 2.19 2.52 0.87
24 0.00 0.00 2.46 8.89 0.28
25 0.99 0.31 0.05 0.01 4.28
26 0.11 0.17 0.34 2.23 0.15
27 0.66 0.25 0.16 0.41 0.38
28 0.87 0.22 0.15 0.14 1.02
29 0.39 ............ 0.45 0.94 0.48
30 0.98 0.25 0.06 0.02 2.53
31 0.77 0.26 1.03 0.26 3.90
32 0.63 0.35 1.80 0.46 3.88
33 0.82 0.26 1.06 0.19 5.47
34 0.47 0.26 1.65 0.75 2.21
35 0.51 0.32 0.39 0.67 0.58
36 0.22 0.18 0.49 1.53 0.32
37 0.80 0.26 1.08 0.23 4.77
38 0.70 0.16 0.26 0.36 0.73
39 0.01 0.23 0.87 4.66 0.19
40 0.08 0.20 0.81 2.52 0.32
Last Coefficient Update
1 2 3 4 5 6
-5.835e-08 1.401e-09 -8.597e-09 -2.822e-07 -4.566e-08 1.256e-07
7
1.058e-08
Covariate Means
1 2 3 4 5 6
5.65 56.58 15.65 0.35 0.28 0.12
7
0.53
Distinct Values For Each Class Variable
Variable 1: 1 2 3 4
Variable 2: 0 1
Stratum Numbers For Each Observation
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20
1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1
21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40
1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1
Number of Missing Values 0