survivalGlm¶
Analyzes censored survival data using a generalized linear model.
Synopsis¶
survivalGlm (nClass, nContinuous, model, x)
Required Arguments¶
- int
nClass
(Input) - Number of classification variables.
- int
nContinuous
(Input) - Number of continuous variables.
- int
model
(Input) Argument
model
specifies the model used to analyze the data.model
PDF of the Response Variable 0 Exponential 1 Linear hazard 2 Log-normal 3 Normal 4 Log-logistic 5 Logistic 6 Log least extreme value 7 Least extreme value 8 Log extreme value 9 Extreme value 10 Weibull See the Description section for more information about these models.
- float
x[[]]
(Input) Array of size
nObservations
by (nClass
+nContinuous
) + m containing data for the independent variables, dependent variable, and optional parameters.The columns must be ordered such that the first
nClass
columns contain data for the class variables, the nextnContinuous
columns contain data for the continuous variables, and the next column contains the response variable. The final (and optional) m − 1 columns contain the optional parameters.
Return Value¶
An integer value indicating the number of estimated coefficients in the model.
Optional Arguments¶
xColCensoring
, inticen
, intilt
, intirt
(Input)Parameter
icen
is the column inx
containing the censoring code for each observation.x
[i] [icen
]Censoring type 0 Exact failure at x
[i] [irt
]1 Right Censored. The response is greater than x
[i] [irt
].2 Left Censored. The response is less than or equal to x
[i] [irt
].3 Interval Censored. The response is greater than x
[i] [irt
], but less than or equal tox
[i] [ilt
].Parameter
ilt
is the column number ofx
containing the upper endpoint of the failure interval for interval- and left-censored observations. If there are no right-censored or interval-censored observations,ilt
should be set to −1.Parameter
irt
is the column number ofx
containing the lower endpoint of the failure interval for interval- and right-censored observations. If there are no right-censored or interval-censored observations,irt
should be set to −1.Exact failure times are specified in column
iy
ofx
. By default,iy
is columnnClass
+nContinuous
ofx
. The default can be changed if keywordxColVariables
is specified.Note that it is allowable to set
iy
=irt
, since a row with aniy
value will never have anirt
value, and vice versa. This use is illustrated in Example 2.xColFrequencies
, int (Input)- Column number of
x
containing the frequency of response for each observation. xColFixedParameter
, int (Input)- Column number in
x
containing a fixed parameter for each observation that is added to the linear response prior to computing the model parameter. The “fixed” parameter allows one to test hypothesis about the parameters via the log-likelihoods. xColVariables
, inticlass
, inticontinuous
, intiy
(Input)This keyword allows specification of the variables to be used in the analysis, and overrides the default ordering of variables described for input argument
x
.Argument
iclass
is an index vector of lengthnClass
containing the column numbers ofx
that correspond to classification variables.Argument
icontinuous
is an index vector of lengthnContinuous
containing the column numbers ofx
that correspond to continuous variables.Argument
iy
corresponds to the column ofx
which contains the dependent variable.eps
, float (Input)Argument
eps
is the convergence criterion. Convergence is assumed when the maximum relative change in any coefficient estimate is less thaneps
from one iteration to the next or when the relative change in the log-likelihood, criterion, from one iteration to the next is less thaneps
/100.0.Default:
eps
= 0.001maxIterations
, int (Input)Maximum number of iterations. Use
maxIterations
= 0 to compute the Hessian, stored incov
, and the Newton step, stored inlastStep
, at the initial estimates (The initial estimates must be input. Use keywordinitialEstInput
).Default:
maxIterations
= 30intercept
, (Input)By default, or if
intercept
is set toTrue
, the intercept is automatically included in the model. Ifintercept
is set toFalse
, there is no intercept in the model (unless otherwise provided for by the user).Default:
intercept
infinityCheck
, int (Input)Remove a right- or left-censored observation from the log-likelihood whenever the probability of the observation exceeds 0.995. At convergence, use linear programming to check that all removed observations actually have infinite linear response
\[z_i \hat{\beta}\]obsStatus
[i] is set to 2 if the linear response is infinite (See optional argumentobsStatus
). If not all removed observations have infinite linear response, re-compute the estimates based upon the observations with finite\[z_i \hat{\beta}\]Parameter
infinityCheck
is the maximum number of observations that can be handled in the linear programming. SettinginfinityCheck
=nObservations
is always sufficient.Default: No infinity checking;
infinityCheck
= 0effects
, intnVarEffects
, intindicesEffects
(Input)Use this keyword to specify the effects in the model.
Variable
nEffects
is the number of effects (sources of variation) in the model. VariablenVarEffects
is an array of lengthnEffects
containing the number of variables associated with each effect in the model.Argument
indicesEffects
is an index array of lengthnVarEffects
[0] +nVarEffects
[1] + … +nVarEffects
[nEffects
− 1]. The firstnVarEffects
[0] elements give the column numbers ofx
for each variable in the first effect. The nextnVarEffects
[1] elements give the column numbers for each variable in the second effect. The lastnVarEffects
[nEffects
− 1] elements give the column numbers for each variable in the last effect.initialEstInput
, float (Input)- Indicates the
nCoefInput
elements ofinitialEstInput
contain initial estimates of the parameters (which requires that the user know the number of coefficients in the model prior to the call tosurvivalGlm
). See optional argumentcoefStat
for a description of the “nuisance
” parameter, which is the first element of arrayinitialEstInput
. maxClass
, int (Input)An upper bound on the sum of the number of distinct values taken on by each classification variable. Internal workspace usage can be significantly reduced with an appropriate choice of
maxClass
.Default:
maxClass
=nObservations
×nClass
classInfo
,nClassValues
,classValues
(Output)Argument
nClassValues
is an array of lengthnClass
containing the number of values taken by each classification variable; the i-th classification variable hasnClassValues
[i] distinct values. ArgumentclassValues
is an array of length\[\sum_{\mathrm{i}=0}^{\mathrm{nClass}-1} \mathrm{nClassValues}[i]\]containing the distinct values of the classification variables in ascending order. The first
nClassValues
[0] elements ofclassValues
contain the values for the first classification variables, the nextnClassValues
[1] elements contain the values for the second classification variable, etc.coefStat
(Output)An array of size
nCoefficients
× 4 containing the parameter estimates and associated statistics:Column Statistic 0 Coefficient estimate. 1 Estimated standard deviation of the estimated coefficient. 2 Asymptotic normal score for testing that the coefficient is zero. 3 The p-value associated with the normal score in Column 2. When present in the model, the first coefficient in
coefStat
is the estimate of the “nuisance” parameter, and the remaining coefficients are estimates of the parameters associated with the “linear” model, beginning with the intercept, if present. Nuisance parameters are as follows:
Model | Description |
---|---|
0 | No nuisance parameter |
1 | Coefficient of the quadratic term in time, θ |
2-9 | Scale parameter, σ |
10 | Shape parameter, θ |
criterion
(Output)- Optimized criterion. The criterion to be maximized is a constant plus the log-likelihood.
cov
(Output)- The array of size
nCoefficients
bynCoefficients
containing the estimated asymptotic covariance matrix of the coefficients. FormaxIterations
= 0, this is the Hessian computed at the initial parameter estimates. means
(Output)- The array containing the means of the design variables. The array is of
length
nCoefficients
− m ifnoIntercept
is specified, and of lengthnCoefficients
− m − 1 otherwise. Here, m is equal to 0 ifmodel
= 0, and equal to 1 otherwise. caseAnalysis
(Output)The array of size
nObservations
by 5 containing the case analysis below:Column Statistic 0 Estimated predicted value. 1 Estimated influence or leverage. 2 Estimated residual. 3 Estimated cumulative hazard. 4 Non-censored observations: Estimated density at the observation failure time and covariate values.
Censored observations: The corresponding estimated probability.
If
maxIterations
= 0,caseAnalysis
is an array of lengthnObservations
containing the estimated probability (for censored observations) or the estimated density (for non-censored observations)lastStep
(Output)- The array of length
nCoefficients
containing the last parameter updates (excluding step halvings). ParameterlastStep
is computed as the inverse of the matrix of second partial derivatives times the vector of first partial derivatives of the log-likelihood. WhenmaxIterations
= 0, the derivatives are computed at the initial estimates. obsStatus
(Output)- The array of length
nObservations
indicating which observations are included in the extended likelihood.
obsStatus[i] | Status of Observation |
---|---|
0 | Observation i is in the likelihood |
1 | Observation i cannot be in the likelihood because
it contains at least one missing value in x . |
2 | Observation i is not in the likelihood. Its estimated parameter is infinite. |
iterations
,n
,iterations
(Output)- The array of size,
n
by 5 containing information about each iteration of the analysis, wheren
is equal to the number of iterations.
Column | Statistic |
---|---|
0 | Method of iteration Q-N Step = 0 N-R Step = 1 |
1 | Iteration number |
2 | Step size |
3 | Maximum scaled coefficient update |
4 | Log-likelihood |
survivalInfo
(Output)- A structure containing information about the survival analysis. This
structure is required input for function
survivalEstimates
. nRowsMissing
(Output)- Number of rows of data that contain missing values in one or more of the
following vectors or columns of
x
:iy
,censorCodesCol
,ilt
,xResponseCol
,freqResponseCol
,ifix
,iclass
,icontinuous
, orindicesEffects
.
Comments¶
- Dummy variables are generated for the classification variables as
follows: An ascending list of all distinct values of each classification
variable is obtained and stored in
classValues
. Dummy variables are then generated for each but the last of these distinct values. Each dummy variable is zero unless the classification variable equals the list value corresponding to the dummy variable, in which case the dummy variable is one. See keywordleaveOutLast
for optional argumentdummy
in regressorsForGlm (Chapter 2, Regression). - The “product” of a classification variable with a covariate yields dummy variables equal to the product of the covariate with each of the dummy variables associated with the classification variable.
- The “product” of two classification variables yields dummy variables in the usual manner. Each dummy variable associated with the first classification variable multiplies each dummy variable associated with the second classification variable. The resulting dummy variables are such that the index of the second classification variable varies fastest.
Description¶
Function survivalGlm
computes the maximum likelihood estimates of
parameters and associated statistics in generalized linear models commonly
found in survival (reliability) analysis. Although the terminology used will
be from the survival area, the methods discussed have applications in many
areas of data analysis, including reliability analysis and event history
analysis. These methods can be used anywhere a random variable from one of
the discussed distributions is parameterized via one of the models available
in survivalGlm
. Thus, while it is not advisable to do so, standard
multiple linear regression can be performed by function survivalGlm
.
Estimates for any of 10 standard models can be computed. Exact,
left-censored, right-censored, or interval-censored observations are allowed
(note that left censoring is the same as interval censoring with the left
endpoint equal to the left endpoint of the support of the distribution).
Let \(\eta=x^T \beta\) be the linear parameterization, where x is a
design vector obtained by survivalGlm
via function regressorsForGlm
from a row of x
, and β is a vector of parameters associated with the
linear model. Let T denote the random response variable and \(S(t)\)
denote the probability that \(T>t\). All models considered also allow a
fixed parameter \(w_i\) for observation i (input in column
xColFixedParameter
of x
). Use of this parameter is discussed below.
There also may be nuisance parameters \(\theta>0\), or \(\sigma>0\)
to be estimated (along with β) in the various models. Let Φ denote the
cumulative normal distribution. The survival models available in
survivalGlm
are:
model | Name | S (t) |
---|---|---|
0 | Exponential | \(\exp\left[-t \exp\left( w_i+\eta\right) \right]\) |
1 | Linear hazard | \(\exp\left[-\left( t+\tfrac{\theta t^2}{2} \right) \exp\left( w_i+\eta\right) \right]\) |
2 | Log-normal | \(1-\phi\left( \tfrac{\ln(t)-\eta-w_i}{\sigma} \right)\) |
3 | Normal | \(1-\phi\left( \tfrac{t-\eta-w_i}{\sigma} \right)\) |
4 | Log-logistic | \(\left\{ 1+\exp\left( \tfrac{\ln(t)-\eta-w_i}{\sigma} \right) \right\}^{-1}\) |
5 | Logistic | \(\left\{ 1+\exp\left( \tfrac{t-\eta-w_i}{\sigma} \right) \right\}^{-1}\) |
6 | Log least extreme value | \(\exp\left\{-\exp\left( \tfrac{\ln(t)-\eta-w_i}{\sigma} \right) \right\}\) |
7 | Least extreme value | \(\exp\left\{-\exp\left( \tfrac{t-\eta-w_i}{\sigma} \right) \right\}\) |
8 | Log extreme value | \(1-\exp\left\{-\exp\left( \tfrac{\ln(t)-\eta-w_i}{\sigma} \right) \right\}\) |
9 | Extreme value | \(1-\exp\left\{-\exp\left( \tfrac{t-\eta-w_i}{\sigma} \right) \right\}\) |
10 | Weibull | \(\exp\left\{-\left[ \tfrac{t}{\exp\left( w_i+\eta\right)} \right]^\theta\right\}\) |
Note that the log-least-extreme-value model is a reparameterization of the Weibull model. Moreover, models 0, 1, 2, 4, 6, 8, and 10 require that T > 0, while all of the remaining models allow any value for T, −∞ < T < ∞.
Each row vector in the data matrix can represent a single observation; or, through the use of vector frequencies, each row can represent several observations. Also note that classification variables and their products are easily incorporated into the models via the usual regression-type specifications.
The constant parameter \(W_i\) is input in x
and may be used for a
number of purposes. For example, if the parameter in an exponential model is
known to depend upon the size of the area tested, volume of a radioactive
mass, or population density, etc., then a multiplicative factor of the
exponential parameter \(\lambda=\exp(x\beta)\) may be known apriori. This
factor can be input in \(W_i\) (\(W_i\) is the log of the factor).
An alternate use of \(W_i\) is as follows: It may be that
\(\lambda=\exp(x_1 \beta_1+x_2 \beta_2)\), where \(\beta_2\) is
known. Letting \(W_i= x_2 \beta_2\), estimates for \(\beta_1\) can be
obtained via survivalGlm
with the known fixed values for \(\beta_2\).
Standard methods can then be used to test hypothesis about \(\beta_1\)
via computed log-likelihoods.
Computational Details¶
The computations proceed as follows:
- The input parameters are checked for consistency and validity. Estimates of the means of the “independent” or design variables are computed. Means are computed as
If initial estimates are not provided by the user (see optional argument
initialEstInput
), the initial estimates are calculated as followsModels 2-10
Kaplan-Meier estimates of the survival probability,
\[\hat{S}(t)\]at the upper limit of each failure interval are obtained. (Because upper limits are used, interval- and left-censored data are assumed to be exact failures at the upper endpoint of the failure interval.) The Kaplan-Meier estimate is computed under the assumption that all failure distributions are identical (i.e., all β’s but the intercept, if present, are assumed to be zero).
If there is an intercept in the model, a simple linear regression is performed predicting
\[S^{-1} \left(\hat{S}(t)\right) - w_i = \alpha + \phi t'\]where \(t'\) is computed at the upper endpoint of each failure interval, \(t'=t\) in models 3, 5, 7, and 9, and \(t'= \ln(t)\) in models 2, 4, 6, 8, and 10, and \(w_i\) is the fixed constant, if present.
If there is no intercept in the model, then α is fixed at zero, and the model
\[S^{-1} \left(\hat{S}(t)\right) - \hat{\phi} t' - w_i = x^T \beta\]is fit instead. In this model, the coefficients β are used in place of the location estimate α above. Here
\[\hat{\phi}\]is estimated from the simple linear regression with \(\alpha=0\).
If the intercept is in the model, then in log-location-scale models (models 1-8),
\[\hat{\sigma} = \hat{\phi}\]and the initial estimate of the intercept is assumed to be \(\hat{\alpha}\).
In the Weibull model
\[\hat{\theta} = 1 / \hat{\phi}\]and the intercept is assumed to be \(\hat{\alpha}\). Initial estimates of all parameters β, other than the intercept, are assumed to be zero. If there is no intercept in the model, the scale parameter is estimated as above, and the estimates
\[\hat{\beta}\]from Step 2 are used as initial estimates for the β’s.
Models 0 and 1
For the exponential models (model
= 0 or 1), the “average total time on”
test statistic is used to obtain an estimate for the intercept.
Specifically, let \(T_t\) denote the total number of failures divided by
the total time on test. The initial estimates for the intercept is then
\(ln(T_t\)). Initial estimates for the remaining parameters β are
assumed to be zero, and if model
= 1, the initial estimate for the
linear hazard parameter θ is assumed to be a small positive number. When the
intercept is not in the model, the initial estimate for the parameter θ is
assumed to be a small positive number, and initial estimates of the
parameters β are computed via multiple linear regression as in Part A.
A quasi-Newton algorithm is used in the initial iterations based on a Hessian estimate
\[\hat{H}_{\kappa_j \kappa_l} = \sum_{i} l' i \alpha_j i \alpha_l\]where \(l'_{i\alpha j}\) is the partial derivative of the i-th term in the log-likelihood with respect to the parameter \(\alpha_j\), and \(\alpha_j\) denotes one of the parameters to be estimated.
When the relative change in the log-likelihood from one iteration to the next is 0.1 or less, exact second partial derivatives are used for the Hessian so the Newton-Rapheson iteration is used.
If the initial step size results in an increase in the log-likelihood, the full step is used. If the log-likelihood decreases for the initial step size, the step size is halved, and a check for an increase in the log-likelihood performed. Step-halving is performed (as a simple line search) until an increase in the log-likelihood is detected, or until the step size becomes very small (the initial step size is 1.0).
Convergence is assumed when the maximum relative change in any coefficient update from one iteration to the next is less than
eps
or when the relative change in the log-likelihood from one iteration to the next is less thaneps
/100. Convergence is also assumed aftermaxIterations
iterations or when step halving leads to a very small step size with no increase in the log-likelihood.If requested (see optional argument
infinityCheck
), then the methods of Clarkson and Jennrich (1988) are used to check for the existence of infinite estimates in\[\eta_i = x_i^T \beta\]As an example of a situation in which infinite estimates can occur, suppose that observation j is right-censored with \(t_j>15\) in a normal distribution model in which the mean is
\[\mu_j = x_j^T \beta = \eta_j\]where \(x_j\) is the observation design vector. If the design vector \(x_j\) for parameter \(\beta_m\) is such that \(x_{jm}=1\) and \(x_{im}=0\) for all \(i\neq j\), then the optimal estimate of \(\beta_m\) occurs at
\[\hat{\beta}_m = \infty\]leading to an infinite estimate of both \(\beta_m\) and \(\eta_j\). In
survivalGlm
, such estimates can be “computed”.In all models fit by
survivalGlm
, infinite estimates can only occur when the optimal estimated probability associated with the left- or right-censored observation is 1. If infinity checking is on, left- or right-censored observations that have estimated probability greater than 0.995 at some point during the iterations are excluded from the log-likelihood, and the iterations proceed with a log-likelihood based on the remaining observations. This allows convergence of the algorithm when the maximum relative change in the estimated coefficients is small and also allows for a more precise determination of observations with infinite\[\eta_i = x_i^T \beta\]At convergence, linear programming is used to ensure that the eliminated observations have infinite \(\eta_i\). If some (or all) of the removed observations should not have been removed (because their estimated \(\eta_i\)’s must be finite), then the iterations are restarted with a log-likelihood based upon the finite \(\eta_i\) observations. See Clarkson and Jennrich (1988) for more details.
When infinity checking is turned off (see optional argument
infinityCheck
), no observations are eliminated during the iterations. In this case, the infinite estimates occur, some (or all) of the coefficient estimates\[\hat{\beta}\]will become large, and it is likely that the Hessian will become (numerically) singular prior to convergence.
The case statistics are computed as follows: Let \(I_i\) (\(\theta_i\))denote the log-likelihood of the i‑th observation evaluated at \(\theta_i\), let \(I'_i\) denote the vector of derivatives of \(I_i\) with respect to all parameters, \(I'_{h,i}\) denote the derivative of \(I_i\) with respect to \(\eta=x^T \beta\), H denote the Hessian, and E denote expectation. Then the columns of
caseAnalysis
are:Predicted values are computed as \(E(T/x)\) according to standard formulas. If model is 4 or 8, and if \(s\geq 1\), then the expected values cannot be computed because they are infinite.
Following Cook and Weisberg (1982), the influence (or leverage) of the i-th observation is assumed to be
\[\left(I'_i\right)^T H^{-1} I'_i\]This quantity is a one-step approximation of the change in the estimates when the i-th observation is deleted (ignoring the nuisance parameters).
The “residual” is computed as \(I'_{h,i}\).
The cumulative hazard is computed at the observation covariate values and, for interval observations, the upper endpoint of the failure interval. The cumulative hazard also can be used as a “residual” estimate. If the model is correct, the cumulative hazards should follow a standard exponential distribution. See Cox and Oakes (1984).
Programming Notes¶
Indicator (dummy) variables are created for the classification variables
using function regressorsForGlm (Chapter 2,
Regression)) using keyword leaveOutLast
as the
argument to the dummy
optional argument.
Examples¶
Example 1¶
This example is taken from Lawless (1982, p. 287) and involves the mortality of patients suffering from lung cancer. An exponential distribution is fit for the model
where \(\alpha_i\) is associated with a classification variable with four levels, and \(\gamma_k\) is associated with a classification variable with two levels.
from numpy import *
from pyimsl.stat.survivalGlm import survivalGlm
from pyimsl.stat.writeMatrix import writeMatrix
x = array([
[1.0, 0.0, 7.0, 64.0, 5.0, 411.0, 0.0],
[1.0, 0.0, 6.0, 63.0, 9.0, 126.0, 0.0],
[1.0, 0.0, 7.0, 65.0, 11.0, 118.0, 0.0],
[1.0, 0.0, 4.0, 69.0, 10.0, 92.0, 0.0],
[1.0, 0.0, 4.0, 63.0, 58.0, 8.0, 0.0],
[1.0, 0.0, 7.0, 48.0, 9.0, 25.0, 1.0],
[1.0, 0.0, 7.0, 48.0, 11.0, 11.0, 0.0],
[2.0, 0.0, 8.0, 63.0, 4.0, 54.0, 0.0],
[2.0, 0.0, 6.0, 63.0, 14.0, 153.0, 0.0],
[2.0, 0.0, 3.0, 53.0, 4.0, 16.0, 0.0],
[2.0, 0.0, 8.0, 43.0, 12.0, 56.0, 0.0],
[2.0, 0.0, 4.0, 55.0, 2.0, 21.0, 0.0],
[2.0, 0.0, 6.0, 66.0, 25.0, 287.0, 0.0],
[2.0, 0.0, 4.0, 67.0, 23.0, 10.0, 0.0],
[3.0, 0.0, 2.0, 61.0, 19.0, 8.0, 0.0],
[3.0, 0.0, 5.0, 63.0, 4.0, 12.0, 0.0],
[4.0, 0.0, 5.0, 66.0, 16.0, 177.0, 0.0],
[4.0, 0.0, 4.0, 68.0, 12.0, 12.0, 0.0],
[4.0, 0.0, 8.0, 41.0, 12.0, 200.0, 0.0],
[4.0, 0.0, 7.0, 53.0, 8.0, 250.0, 0.0],
[4.0, 0.0, 6.0, 37.0, 13.0, 100.0, 0.0],
[1.0, 1.0, 9.0, 54.0, 12.0, 999.0, 0.0],
[1.0, 1.0, 5.0, 52.0, 8.0, 231.0, 1.0],
[1.0, 1.0, 7.0, 50.0, 7.0, 991.0, 0.0],
[1.0, 1.0, 2.0, 65.0, 21.0, 1.0, 0.0],
[1.0, 1.0, 8.0, 52.0, 28.0, 201.0, 0.0],
[1.0, 1.0, 6.0, 70.0, 13.0, 44.0, 0.0],
[1.0, 1.0, 5.0, 40.0, 13.0, 15.0, 0.0],
[2.0, 1.0, 7.0, 36.0, 22.0, 103.0, 1.0],
[2.0, 1.0, 4.0, 44.0, 36.0, 2.0, 0.0],
[2.0, 1.0, 3.0, 54.0, 9.0, 20.0, 0.0],
[2.0, 1.0, 3.0, 59.0, 87.0, 51.0, 0.0],
[3.0, 1.0, 4.0, 69.0, 5.0, 18.0, 0.0],
[3.0, 1.0, 6.0, 50.0, 22.0, 90.0, 0.0],
[3.0, 1.0, 8.0, 62.0, 4.0, 84.0, 0.0],
[4.0, 1.0, 7.0, 68.0, 15.0, 164.0, 0.0],
[4.0, 1.0, 3.0, 39.0, 4.0, 19.0, 0.0],
[4.0, 1.0, 6.0, 49.0, 11.0, 43.0, 0.0],
[4.0, 1.0, 8.0, 64.0, 10.0, 340.0, 0.0],
[4.0, 1.0, 7.0, 67.0, 18.0, 231.0, 0.0]])
icen = 6
ilt = -1
irt = 5
lp_max = 40
n_class = 2
n_continuous = 3
model = 0
fmt = "%12.4f"
coef_stat = []
estimates = []
clabels = ["", "coefficient", "s.e.", "z", "p"]
n_coef = survivalGlm(n_class,
n_continuous, model, x,
xColCensoring={'icen': icen, 'ilt': ilt, 'irt': irt},
infinityCheck=lp_max,
coefStat=coef_stat)
writeMatrix("Coefficient Statistics", coef_stat, writeFormat=fmt,
noRowLabels=True, colLabels=clabels)
Output¶
Coefficient Statistics
coefficient s.e. z p
-1.1027 1.3140 -0.8392 0.4016
-0.3626 0.4446 -0.8157 0.4149
0.1271 0.4863 0.2613 0.7939
0.8690 0.5861 1.4825 0.1385
0.2697 0.3882 0.6948 0.4873
-0.5400 0.1081 -4.9946 0.0000
-0.0090 0.0197 -0.4594 0.6460
-0.0034 0.0117 -0.2912 0.7710
Example 2¶
This example is the same as Example 1, but more optional arguments are demonstrated.
from __future__ import print_function
from numpy import *
from pyimsl.stat.survivalGlm import survivalGlm
from pyimsl.stat.writeMatrix import writeMatrix
n_class = 2
n_continuous = 3
x = array([
[1.0, 0.0, 7.0, 64.0, 5.0, 411.0, 0.0],
[1.0, 0.0, 6.0, 63.0, 9.0, 126.0, 0.0],
[1.0, 0.0, 7.0, 65.0, 11.0, 118.0, 0.0],
[1.0, 0.0, 4.0, 69.0, 10.0, 92.0, 0.0],
[1.0, 0.0, 4.0, 63.0, 58.0, 8.0, 0.0],
[1.0, 0.0, 7.0, 48.0, 9.0, 25.0, 1.0],
[1.0, 0.0, 7.0, 48.0, 11.0, 11.0, 0.0],
[2.0, 0.0, 8.0, 63.0, 4.0, 54.0, 0.0],
[2.0, 0.0, 6.0, 63.0, 14.0, 153.0, 0.0],
[2.0, 0.0, 3.0, 53.0, 4.0, 16.0, 0.0],
[2.0, 0.0, 8.0, 43.0, 12.0, 56.0, 0.0],
[2.0, 0.0, 4.0, 55.0, 2.0, 21.0, 0.0],
[2.0, 0.0, 6.0, 66.0, 25.0, 287.0, 0.0],
[2.0, 0.0, 4.0, 67.0, 23.0, 10.0, 0.0],
[3.0, 0.0, 2.0, 61.0, 19.0, 8.0, 0.0],
[3.0, 0.0, 5.0, 63.0, 4.0, 12.0, 0.0],
[4.0, 0.0, 5.0, 66.0, 16.0, 177.0, 0.0],
[4.0, 0.0, 4.0, 68.0, 12.0, 12.0, 0.0],
[4.0, 0.0, 8.0, 41.0, 12.0, 200.0, 0.0],
[4.0, 0.0, 7.0, 53.0, 8.0, 250.0, 0.0],
[4.0, 0.0, 6.0, 37.0, 13.0, 100.0, 0.0],
[1.0, 1.0, 9.0, 54.0, 12.0, 999.0, 0.0],
[1.0, 1.0, 5.0, 52.0, 8.0, 231.0, 1.0],
[1.0, 1.0, 7.0, 50.0, 7.0, 991.0, 0.0],
[1.0, 1.0, 2.0, 65.0, 21.0, 1.0, 0.0],
[1.0, 1.0, 8.0, 52.0, 28.0, 201.0, 0.0],
[1.0, 1.0, 6.0, 70.0, 13.0, 44.0, 0.0],
[1.0, 1.0, 5.0, 40.0, 13.0, 15.0, 0.0],
[2.0, 1.0, 7.0, 36.0, 22.0, 103.0, 1.0],
[2.0, 1.0, 4.0, 44.0, 36.0, 2.0, 0.0],
[2.0, 1.0, 3.0, 54.0, 9.0, 20.0, 0.0],
[2.0, 1.0, 3.0, 59.0, 87.0, 51.0, 0.0],
[3.0, 1.0, 4.0, 69.0, 5.0, 18.0, 0.0],
[3.0, 1.0, 6.0, 50.0, 22.0, 90.0, 0.0],
[3.0, 1.0, 8.0, 62.0, 4.0, 84.0, 0.0],
[4.0, 1.0, 7.0, 68.0, 15.0, 164.0, 0.0],
[4.0, 1.0, 3.0, 39.0, 4.0, 19.0, 0.0],
[4.0, 1.0, 6.0, 49.0, 11.0, 43.0, 0.0],
[4.0, 1.0, 8.0, 64.0, 10.0, 340.0, 0.0],
[4.0, 1.0, 7.0, 67.0, 18.0, 231.0, 0.0]])
model = 0
icen = 6
ilt = -1
irt = 5
lp_max = 40
fmt = "%12.4f"
fmt2 = "%4d%4d%6.4f%8.4f%8.1f"
coef_stat = []
iter = {}
class_info = {}
clabels = ["", "coefficient", "s.e.", "z", "p"]
clabels2 = ["", "Method", "Iteration",
"Step Size", "Coef Update", "Log-Likelihood"]
nrmiss = []
criterion = []
obs = []
casex = []
n_coef = survivalGlm(n_class,
n_continuous, model, x,
xColCensoring={'icen': icen, 'ilt': ilt, 'irt': irt},
infinityCheck=lp_max,
coefStat=coef_stat,
iterations=iter,
caseAnalysis=casex,
classInfo=class_info,
obsStatus=obs,
criterion=criterion,
nRowsMissing=nrmiss)
ncv = class_info['nClassValues']
cv = class_info['classValues']
iterations = iter['iterations']
writeMatrix("Coefficient Statistics", coef_stat, writeFormat=fmt,
noRowLabels=True, colLabels=clabels)
writeMatrix("Iteration Information", iterations, writeFormat=fmt2,
noRowLabels=True, colLabels=clabels2)
print("\nLog-Likelihood = %12.5f" % criterion[0])
writeMatrix("Case Analysis", casex[0:8], writeFormat=fmt)
writeMatrix("Distinct Values for Classification Variable 1",
cv[0:ncv[0]], noColLabels=True)
writeMatrix("Distinct Values for Classification Variable 2",
cv[ncv[0]:], noColLabels=True)
writeMatrix("Observation Status", obs, writeFormat="%2i")
print("\nNumber of Missing Values = %2d\n" % nrmiss[0])
Output¶
Log-Likelihood = -204.13914
Number of Missing Values = 0
Coefficient Statistics
coefficient s.e. z p
-1.1027 1.3140 -0.8392 0.4016
-0.3626 0.4446 -0.8157 0.4149
0.1271 0.4863 0.2613 0.7939
0.8690 0.5861 1.4825 0.1385
0.2697 0.3882 0.6948 0.4873
-0.5400 0.1081 -4.9946 0.0000
-0.0090 0.0197 -0.4594 0.6460
-0.0034 0.0117 -0.2912 0.7710
Iteration Information
Method Iteration Step Size Coef Update Log-Likelihood
0 0 ...... ........ -224.0
0 1 1.0000 0.9839 -213.4
1 2 1.0000 3.6032 -207.3
1 3 1.0000 10.1235 -204.3
1 4 1.0000 0.1430 -204.1
1 5 1.0000 0.0117 -204.1
1 6 1.0000 0.0001 -204.1
Case Analysis
1 2 3 4 5
1 262.6932 0.0450 -0.5646 1.5646 0.0008
2 153.7803 0.0042 0.1806 0.8194 0.0029
3 270.5395 0.0482 0.5638 0.4362 0.0024
4 55.3175 0.0844 -0.6631 1.6631 0.0034
5 61.6853 0.3765 0.8703 0.1297 0.0142
6 230.4467 0.0025 -0.1085 0.1085 0.8972
7 232.0188 0.1960 0.9526 0.0474 0.0041
8 272.8439 0.1677 0.8021 0.1979 0.0030
Distinct Values for Classification Variable 1
1 2 3 4
Distinct Values for Classification Variable 2
0 1
Observation Status
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20
0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40
0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
Example 3¶
In this example, the same data and model as Example 1 are used, but maxIterations
is set to zero
iterations with model coefficients restricted such that \(\mu=-1.25\),
\(\beta_6=-0.6\), and the remaining six coefficients are equal to zero. A
chi-squared statistic, with 8 degrees of freedom for testing the coefficients
is specified as above (versus the alternative that it is not as specified),
can be computed, based on the output, as
where
is output in cov
. The resulting test statistic, \(\chi^2=6.107\),
based upon no iterations is comparable to likelihood ratio test that can be
computed from the log-likelihood output in this example (−206.6835) and the
log-likelihood output in Example 2
(−204.1392).
Neither statistic is significant at the \(\alpha=0.05\) level.
from __future__ import print_function
from numpy import *
from pyimsl.stat.survivalGlm import survivalGlm
from pyimsl.stat.writeMatrix import writeMatrix
n_class = 2
n_continuous = 3
x = array([
[1.0, 0.0, 7.0, 64.0, 5.0, 411.0, 0.0],
[1.0, 0.0, 6.0, 63.0, 9.0, 126.0, 0.0],
[1.0, 0.0, 7.0, 65.0, 11.0, 118.0, 0.0],
[1.0, 0.0, 4.0, 69.0, 10.0, 92.0, 0.0],
[1.0, 0.0, 4.0, 63.0, 58.0, 8.0, 0.0],
[1.0, 0.0, 7.0, 48.0, 9.0, 25.0, 1.0],
[1.0, 0.0, 7.0, 48.0, 11.0, 11.0, 0.0],
[2.0, 0.0, 8.0, 63.0, 4.0, 54.0, 0.0],
[2.0, 0.0, 6.0, 63.0, 14.0, 153.0, 0.0],
[2.0, 0.0, 3.0, 53.0, 4.0, 16.0, 0.0],
[2.0, 0.0, 8.0, 43.0, 12.0, 56.0, 0.0],
[2.0, 0.0, 4.0, 55.0, 2.0, 21.0, 0.0],
[2.0, 0.0, 6.0, 66.0, 25.0, 287.0, 0.0],
[2.0, 0.0, 4.0, 67.0, 23.0, 10.0, 0.0],
[3.0, 0.0, 2.0, 61.0, 19.0, 8.0, 0.0],
[3.0, 0.0, 5.0, 63.0, 4.0, 12.0, 0.0],
[4.0, 0.0, 5.0, 66.0, 16.0, 177.0, 0.0],
[4.0, 0.0, 4.0, 68.0, 12.0, 12.0, 0.0],
[4.0, 0.0, 8.0, 41.0, 12.0, 200.0, 0.0],
[4.0, 0.0, 7.0, 53.0, 8.0, 250.0, 0.0],
[4.0, 0.0, 6.0, 37.0, 13.0, 100.0, 0.0],
[1.0, 1.0, 9.0, 54.0, 12.0, 999.0, 0.0],
[1.0, 1.0, 5.0, 52.0, 8.0, 231.0, 1.0],
[1.0, 1.0, 7.0, 50.0, 7.0, 991.0, 0.0],
[1.0, 1.0, 2.0, 65.0, 21.0, 1.0, 0.0],
[1.0, 1.0, 8.0, 52.0, 28.0, 201.0, 0.0],
[1.0, 1.0, 6.0, 70.0, 13.0, 44.0, 0.0],
[1.0, 1.0, 5.0, 40.0, 13.0, 15.0, 0.0],
[2.0, 1.0, 7.0, 36.0, 22.0, 103.0, 1.0],
[2.0, 1.0, 4.0, 44.0, 36.0, 2.0, 0.0],
[2.0, 1.0, 3.0, 54.0, 9.0, 20.0, 0.0],
[2.0, 1.0, 3.0, 59.0, 87.0, 51.0, 0.0],
[3.0, 1.0, 4.0, 69.0, 5.0, 18.0, 0.0],
[3.0, 1.0, 6.0, 50.0, 22.0, 90.0, 0.0],
[3.0, 1.0, 8.0, 62.0, 4.0, 84.0, 0.0],
[4.0, 1.0, 7.0, 68.0, 15.0, 164.0, 0.0],
[4.0, 1.0, 3.0, 39.0, 4.0, 19.0, 0.0],
[4.0, 1.0, 6.0, 49.0, 11.0, 43.0, 0.0],
[4.0, 1.0, 8.0, 64.0, 10.0, 340.0, 0.0],
[4.0, 1.0, 7.0, 67.0, 18.0, 231.0, 0.0]])
model = 0
icen = 6
ilt = -1
irt = 5
lp_max = 40
fmt = "%12.4f"
fmt2 = "%4d%4d%6.4f%8.4f%8.1f"
coef_stat = []
iter = {}
class_info = {}
clabels = ["", "coefficient", "s.e.", "z", "p"]
clabels2 = ["", "Method", "Iteration",
"Step Size", "Coef Update", "Log-Likelihood"]
nrmiss = []
criterion = []
obs = []
casex = []
n_coef = survivalGlm(n_class,
n_continuous, model, x,
xColCensoring={'icen': icen, 'ilt': ilt, 'irt': irt},
infinityCheck=lp_max,
coefStat=coef_stat,
iterations=iter,
caseAnalysis=casex,
classInfo=class_info,
obsStatus=obs,
criterion=criterion,
nRowsMissing=nrmiss)
ncv = class_info['nClassValues']
cv = class_info['classValues']
iterations = iter['iterations']
writeMatrix("Coefficient Statistics", coef_stat, writeFormat=fmt,
noRowLabels=True, colLabels=clabels)
writeMatrix("Iteration Information", iterations, writeFormat=fmt2,
noRowLabels=True, colLabels=clabels2)
print("\nLog-Likelihood = %12.5f" % criterion[0])
writeMatrix("Case Analysis", casex[0:8], writeFormat=fmt)
writeMatrix("Distinct Values for Classification Variable 1",
cv[0:ncv[0]], noColLabels=True)
writeMatrix("Distinct Values for Classification Variable 2",
cv[ncv[0]:], noColLabels=True)
writeMatrix("Observation Status", obs, writeFormat="%2i")
print("\nNumber of Missing Values = %2d\n" % nrmiss[0])
Output¶
Log-Likelihood = -204.13914
Number of Missing Values = 0
Coefficient Statistics
coefficient s.e. z p
-1.1027 1.3140 -0.8392 0.4016
-0.3626 0.4446 -0.8157 0.4149
0.1271 0.4863 0.2613 0.7939
0.8690 0.5861 1.4825 0.1385
0.2697 0.3882 0.6948 0.4873
-0.5400 0.1081 -4.9946 0.0000
-0.0090 0.0197 -0.4594 0.6460
-0.0034 0.0117 -0.2912 0.7710
Iteration Information
Method Iteration Step Size Coef Update Log-Likelihood
0 0 ...... ........ -224.0
0 1 1.0000 0.9839 -213.4
1 2 1.0000 3.6032 -207.3
1 3 1.0000 10.1235 -204.3
1 4 1.0000 0.1430 -204.1
1 5 1.0000 0.0117 -204.1
1 6 1.0000 0.0001 -204.1
Case Analysis
1 2 3 4 5
1 262.6932 0.0450 -0.5646 1.5646 0.0008
2 153.7803 0.0042 0.1806 0.8194 0.0029
3 270.5395 0.0482 0.5638 0.4362 0.0024
4 55.3175 0.0844 -0.6631 1.6631 0.0034
5 61.6853 0.3765 0.8703 0.1297 0.0142
6 230.4467 0.0025 -0.1085 0.1085 0.8972
7 232.0188 0.1960 0.9526 0.0474 0.0041
8 272.8439 0.1677 0.8021 0.1979 0.0030
Distinct Values for Classification Variable 1
1 2 3 4
Distinct Values for Classification Variable 2
0 1
Observation Status
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20
0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40
0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
Warning Errors¶
IMSLS_CONVERGENCE_ASSUMED_1 |
Too many step halvings. Convergence is assumed. |
IMSLS_CONVERGENCE_ASSUMED_2 |
Too many step iterations. Convergence is assumed. |
IMSLS_NO_PREDICTED_1 |
“estimates[0] ” > 1.0. The
expected value for the log
logistic distribution
(“model ” = 4) does not
exist. Predicted values will not
be calculated. |
IMSLS_NO_PREDICTED_2 |
“estimates[0] ” > 1.0. The
expected value for the log extreme
value distribution(“model ”
= 8) does not exist. Predicted
values will not be calculated. |
IMSLS_NEG_EIGENVALUE |
The Hessian has at least one negative eigenvalue. An upper bound on the absolute value of the minimum eigenvalue is # corresponding to variable index #. |
IMSLS_INVALID_FAILURE_TIME_4 |
“x [#][“ilt ”= #]” =
# and
“x[#][“xResponseCol ”= #]”
= #. The censoring interval has
length 0.0. The censoring code for
this observation is being set to
0.0. |
Fatal Errors¶
IMSLS_MAX_CLASS_TOO_SMALL |
The number of distinct values of the
classification variables exceeds
“maxClass ” = #. |
IMSLS_TOO_FEW_COEF |
initialEstInput is specified, and
“nCoefInput ” = #. The model specified
requires # coefficients. |
IMSLS_TOO_FEW_VALID_OBS |
“nObservations ” = # and
“nRowsMissing ” = #.
“nObservations ”−”nRowsMissing ”
must be greater than or equal to 2 in order to
estimate the coefficients. |
IMSLS_SVGLM_1 |
For the exponential model (“model ” =
0) with “nEffects ” = # and no
intercept, “nCoef ” has been determined
to equal 0. With no coefficients in the model,
processing cannot continue. |
IMSLS_INCREASE_LP_MAX |
Too many observations are to be deleted from the model. Either use a different model or increase the workspace. |
IMSLS_INVALID_DATA_8 |
“nClassValues [#]” = #. The number of
distinct values for each classification
variable must be greater than one. |