public class ProportionalHazards extends Object implements Serializable, Cloneable
Class ProportionalHazards
computes parameter estimates and
other statistics in Proportional Hazards Generalized Linear Models. These
models were first proposed by Cox (1972). Two methods for handling ties are
allowed. Time-dependent covariates are not allowed. The user is referred to
Cox and Oakes (1984), Kalbfleisch and Prentice (1980), Elandt-Johnson and
Johnson (1980), Lee (1980), or Lawless (1982), among other texts, for a
thorough discussion of the Cox proportional hazards model.
Let \(\lambda(t,z_i)\) represent the hazard rate at
time t for observation number i with covariables contained as
elements of row vector \(z_i\). The basic assumption in the
proportional hazards model (the proportionality assumption) is that the hazard
rate can be written as a product of a time varying function
\(\lambda_0(t)\), which depends only on time, and a
function \(f(z_i)\), which depends only on the covariable
values. The function \(f(z_i)\) used in
ProportionalHazards
is given as \(f(z_i) = \exp(w_i + \beta z_i)\)
where \(w_i\) is a fixed constant assigned to the observation,
and b is a vector of coefficients to be estimated. With this function
one obtains a hazard rate
\(\lambda(t, z_i) = \lambda_0(t) \exp(w_i + \beta z_i)\).
The form of \(\lambda_0(t)\) is not important in proportional
hazards models.
The constants \(w_i\) may be known theoretically. For
example, the hazard rate may be proportional to a known length or area, and
the \(w_i\) can then be determined from this known length or
area. Alternatively, the \(w_i\) may be used to fix a subset
of the coefficients \(\beta\) (say, \(\beta_1\))
at specified values. When \(w_i\) is used in this way,
constants \(w_i=\beta_1z_{1i}\) are used, while the
remaining coefficients in \(\beta\) are free to vary in the
optimization algorithm. Constants are defined as 0.0 by default. If
user-specified constants are desired, use the setConstantColumn
method to specify which column contains the constant.
With this definition of \(\lambda(t,z_i)\), the usual partial (or marginal, see Kalbfleisch and Prentice (1980)) likelihood becomes
$$L=\prod_{i=1}^{n_d}\frac{\exp(w_i+\beta z_i)} {\sum_{j\in R(t_i)}^{}\exp(w_j+\beta z_j)}$$
where \(R(t_i)\) denotes the set of indices of
observations that have not yet failed at time \(t_i\) (the
risk set), \(t_i\) denotes the time of failure for the i-th
observation, \(n_d\) is the total number of observations
that fail. Right-censored observations (i.e., observations that are known to
have survived to time \(t_i\), but for which no time of
failure is known) are incorporated into the likelihood through the risk set
\(R(t_i)\). Such observations never appear in the numerator
of the likelihood. When setTieOptions
is set to
BRESLOWS_APPROXIMATE
(the default), all observations that are
censored at
time \(t_i\) are not included in \(R(t_i)\),
while all observations that fail at time \(t_i\) are
included in \(R(t_i)\).
If it can be assumed that the dependence of the hazard rate upon the
covariate values remains the same from stratum to stratum, while the
time-dependent term, \(\lambda_0(t)\), may be different in
different strata, then ProportionalHazards
allows the
incorporation of strata into the likelihood as follows. Let k index
the m strata (set with setStratumColumn
). Then, the
likelihood is given by
$$L_S=\prod_{k=1}^{m}\left [ \prod_{i=1}^{n_k}\frac{\exp(w_{ki}+\beta z_{ki})}{\sum_{j\in R(t_{ki})}^{}\exp(w_{kj}+\beta z_{kj})} \right ]$$
In ProportionalHazards
, the log of the likelihood is
maximized with respect to the coefficients \(\beta\). A
quasi-Newton algorithm approximating the Hessian via the matrix of sums of
squares and cross products of the first partial derivatives is used in the
initial iterations. When the change in the log-likelihood from one iteration
to the next is less than 100 times the convergence tolerance,
Newton-Raphson iteration is used. If, during any iteration, the
initial step does not lead to an increase in the log-likelihood, then step
halving is employed to find a step that will increase the log-likelihood.
Once the maximum likelihood estimates have been computed, the algorithm computes estimates of a probability associated with each failure. Within stratum k, an estimate of the probability that the i-th observation fails at time \(t_i\) given the risk set \(R(t_{ki})$$ is given by
$$p_{ki}=\frac{\exp(w_{ki}+\beta z_{ki})}{\sum_{j\in R(t_{ki})}^{}\exp(w_{kj}+\beta z_{kj})}$$
A diagnostic "influence" or "leverage" statistic is computed for each noncensored observation as:
$$l_{ki}=-{g}'_{ki}H^{-1}_s{g}'_{ki}$$
where \(H_s\) is the matrix of second partial derivatives of the log-likelihood, and$${g}'_{ki}\) is computed as:
$${g}'_{ki}=z_{ki}-\frac{z_{ki}\exp(w_{ki}+\beta z_{ki})}{\sum_{j\in R(t_{ki})}^{}\exp(w_{kj}+\beta z_{kj})}$$
Influence statistics are not computed for censored observations.
A "residual" is computed for each of the input observations according to methods given in Cox and Oakes (1984, page 108). Residuals are computed as
$$r_{ki}=\exp(w_{ki}+\hat{\beta} z_{ki})\sum_{j \in R(t_{ki})}^{}\frac{d_{kj}}{\sum_{l \in R(t_{kj})}^{}\exp(w_{kl}+\hat{\beta} z_{kl})}$$
where \(d_{kj}\) is the number of tied failures in group k at time \(t_{kj}\). Assuming that the proportional hazards assumption holds, the residuals should approximate a random sample (with censoring) from the unit exponential distribution. By subtracting the expected values, centered residuals can be obtained. (The j-th expected order statistic from the unit exponential with censoring is given as
$$e_j=\sum_{l \le j}^{}\frac{1}{h-l+1}$$
where h is the sample size, and censored observations are not included in the summation.)
An estimate of the cumulative baseline hazard within group k is given as
$$\hat{H}_{k0}(t_{ik})=\sum_{t_{kj} \le t_{ki}}^{}\frac{d_{kj}}{\sum_{l \in R(t_{kj})}^{}\exp(w_{kl}+\hat{\beta} z_{kl})}$$
The observation proportionality constant is computed as
$$\exp(w_{ki}+\hat{\beta}z_{ki})$$
Note that the user can use the JDK JAVA Logging API to generate intermediate output for the solver. Accumulated levels of detail correspond to JAVA's FINE, FINER, and FINEST logging levels with FINE yielding the smallest amount of information and FINEST yielding the most. The levels of output yield the following:
Level | Output |
FINE | Logging is enabled, but observational statistics are not printed. |
FINER | All output statistics are printed. |
FINEST | Tracks progress through internal methods. |
Modifier and Type | Class and Description |
---|---|
static class |
ProportionalHazards.ClassificationVariableLimitException
The Classification Variable limit set by the user through
setUpperBound has been exceeded. |
Modifier and Type | Field and Description |
---|---|
static int |
BRESLOWS_APPROXIMATE
Breslows approximate method of handling ties.
|
static int |
SORTED_AS_PER_OBSERVATIONS
Failures are assumed to occur in the same order as the
observations input in
x . |
Constructor and Description |
---|
ProportionalHazards(double[][] x,
int[] nVarEffects,
int[] indEffects)
Constructor for
ProportionalHazards . |
Modifier and Type | Method and Description |
---|---|
double[][] |
getCaseStatistics()
Returns the case statistics for each observation.
|
int |
getCensorColumn()
Returns the column index of
x containing the
optional censoring code for each observation. |
int[] |
getClassValueCounts()
Returns the number of values taken by each classification variable.
|
double[] |
getClassValues()
Returns the class values taken by each classification variable.
|
int |
getConstantColumn()
Returns the column index of
x containing the constant to be added
to the linear response. |
double |
getConvergenceTol()
Returns the convergence tolerance used.
|
int |
getFrequencyColumn()
Returns the column index of
x containing the frequency of
response for each observation. |
double[] |
getGradient()
Returns the inverse of the Hessian times the gradient vector,
computed at the initial estimates.
|
double[][] |
getHessian()
Returns the inverse of the Hessian of the negative of the log-likelihood,
computed at the initial estimates.
|
boolean |
getHessianOption()
Returns the
boolean used to indicate whether or not to compute the Hessian
and gradient at the initial estimates. |
double[] |
getInitialEstimates()
Gets the initial parameter estimates.
|
double[] |
getLastUpdates()
Gets the last parameter updates.
|
Logger |
getLogger()
Returns the logger object and enables logging.
|
int |
getMaxClass()
Returns the upper bound used on the sum of the number of distinct values found among
the classification variables in
x . |
double |
getMaximumLikelihood()
Returns the maximized log-likelihood.
|
int |
getMaxIterations()
Return the maximum number of iterations allowed.
|
double[] |
getMeans()
Returns the means of the design variables.
|
int |
getNumberOfCoefficients()
Returns the number of estimated coefficients in the model.
|
int |
getNumberRowsMissing()
Returns the number of rows of data in
x that contain
missing values in one or more specific columns of x . |
double[][] |
getParameterStatistics()
Returns the parameter estimates and associated statistics.
|
int |
getResponseColumn()
Returns the column index of
x containing the
response time for each observation. |
int |
getStratumColumn()
Returns the column index of
x containing the stratum number
for each observation. |
int[] |
getStratumNumbers()
Returns the stratum number used for each observation.
|
double |
getStratumRatio()
Returns the ratio at which a stratum is split into two strata.
|
int |
getTiesOption()
Returns the method used for handling ties.
|
double[][] |
getVarianceCovarianceMatrix()
Returns the estimated asymptotic variance-covariance matrix of the parameters.
|
void |
setCensorColumn(int censorIndex)
Sets the column index of
x containing the
optional censoring code for each observation. |
void |
setClassVarColumns(int[] classVarIndices)
Sets the column indices of
x that are the classification variables. |
void |
setConstantColumn(int fixedIndex)
Sets the column index of
x containing the constant \(w_i\) to be added
to the linear response. |
void |
setConvergenceTol(double convergenceTol)
Set the convergence tolerance.
|
void |
setFrequencyColumn(int frequencyIndex)
Sets the column index of
x containing the frequency of
response for each observation. |
void |
setHessianOption(boolean wantHessian)
Set the option to have the Hessian and gradient be computed at the initial estimates.
|
void |
setInitialEstimates(double[] initialCoef)
Sets the initial parameter estimates.
|
void |
setMaxClass(int maxClass)
Sets an upper bound on the sum of the number of distinct values found among
the classification variables in
x . |
void |
setMaxIterations(int maxIterations)
Set the maximum number of iterations allowed.
|
void |
setResponseColumn(int responseIndex)
Sets the column index of
x containing the
response variable. |
void |
setStratumColumn(int stratumIndex)
Sets the column index of
x containing the stratification
variable. |
void |
setStratumRatio(double stratumRatio)
Set the ratio at which a stratum is split into two strata.
|
void |
setTiesOption(int iTie)
Sets the method for handling ties.
|
public static final int BRESLOWS_APPROXIMATE
setTiesOption
.public static final int SORTED_AS_PER_OBSERVATIONS
x
. The observations in x
must be sorted from largest to smallest failure time within each stratum,
and grouped by stratum. All observations are treated as if their
failure/censoring times were distinct when computing the log-likelihood.
See setTiesOption
.public ProportionalHazards(double[][] x, int[] nVarEffects, int[] indEffects)
ProportionalHazards
.x
- a double
matrix containing the data,
including optional data.nVarEffects
- an int
array containing the number of variables
associated with each effect in the model.indEffects
- an int
array containing the column numbers of
x
associated with each effect. The first
nVarEffects[0]
elements of indEffects
contain the column numbers of x
for the variables
in the first effect. The next nVarEffects[1]
elements of indEffects
contain the column numbers of x
for the variables
in the second effect, etc.public Logger getLogger()
java.util.logging.Logger
object, if present, or
null
.public void setResponseColumn(int responseIndex)
x
containing the
response variable.
For point observations,
x[i][responseIndex]
contains the time of the i-th event. For right-censored
observations, x[i][responseIndex]
contains the right-censoring time. Note that because
ProportionalHazards
only uses the order of the
events, negative "times" are allowed.
By default responseIndex = 0
.
responseIndex
- an int
specifying the column index of x
containing the response variable.public void setCensorColumn(int censorIndex)
x
containing the
optional censoring code for each observation.
If x[i][censorIndex]
equals 0, the failure time
x[i][responseIndex]
is treated as an exact time of failure.
Otherwise, it is treated as right-censored time.
By default, it is assumed that there is no censor code column in
x
and all observations are assumed to be exact failure
times.
censorIndex
- an int
specifying the column index of x
containing the optional censoring code for each observation.public void setFrequencyColumn(int frequencyIndex)
x
containing the frequency of
response for each observation.
By default it is assumed that there is no
frequency response column recorded in x
.
Each observation in the data array is assumed to be for a
single failure; that is, the frequency of response for each observation is 1.
frequencyIndex
- an int
specifying the column index of x
containing the frequency of
response for each observation.public void setConstantColumn(int fixedIndex)
x
containing the constant \(w_i\) to be added
to the linear response.
The linear response is taken to be \(w_i+z_i\hat{\beta}\) where \(w_i\) is the observation constant, \(z_i\) is the observation design row vector, and \(\hat{\beta}\) is the vector of estimated parameters. The "fixed" constant allows one to test hypotheses about parameters via the log-likelihoods. If this method is not called, it is assumed that \(w_i = 0\) for all observations.
fixedIndex
- an int
specifying the column index of x
containing the constant to be added to the linear response.public void setStratumColumn(int stratumIndex)
x
containing the stratification
variable.
Column stratumIndex
of
x
contains a unique value for each stratum in the data.
The risk set for an observation is determined by its stratum.
By default it is assumed that all obvservations
are from one statum.
stratumIndex
- an int
specifying the column index of x
containing the
stratification variable.public void setMaxClass(int maxClass)
x
.
For example,
if the model consisted of of two class variables, one with the
values {1, 2, 3, 4} and a second with the values {0, 1}, then
the total number of different classification values is
4 + 2 = 6, and maxClass
\(\gt= 6\).
The default value is the number of observations
in x
.
maxClass
- an int
representing an upper bound
on the sum of the number of distinct values found among
the classification variables in x
.public void setMaxIterations(int maxIterations)
maxIterations
- an int
specifying the maximum number of
iterations allowed.
The default value is 30.public void setHessianOption(boolean wantHessian)
wantHessian
- a boolean
specifying whether or not the
Hessian and gradient are to be computed at the initial
estimates. If this option is set to true
the
user must set the initial estimates via the setInitialEstimates
method.
By default the Hessian and gradient are not computed at the initial estimates.public void setConvergenceTol(double convergenceTol)
Convergence is assumed when the relative change in
the maximum likelihood from one iteration to the next is
less than convergenceTol
. If convergenceTol
is zero, convergenceTol
= 0.0001 is assumed.
The default value is 0.0001.
convergenceTol
- a double
specifying the convergence
tolerance.public void setStratumRatio(double stratumRatio)
Let $$r_k=\exp(z_k\hat{\beta}+w_k)$$ be
the observation proportionality constant, where
\(z_k\)
is the design row vector for the k-th observation
and \(w_k\) is the optional fixed
parameter specified by \(x_{k,\mathrm{fixedIndex}}\).
Let \(r_{min}\) be the minimum value
\(r_k\) in a
stratum, where, for failed observations, the minimum is
over all times less than or equal to the time of occurrence
of the k-th observation. Let \(r_{max}\)
be the maximum value of \(r_k\) for the
remaining observations in the group. Then, if \(r_{min}\) \(\gt\)
stratumRatio
\( * r_{max}\),
the observations in the group are divided into two groups
at k. The default value of
stratumRatio
= 1000 is usually good.
Set stratumRatio
to any negative value
if no division into strata is to be made.
stratumRatio
- a double
specifying the ratio at which
a stratum is split into two strata.public void setInitialEstimates(double[] initialCoef)
Care should be taken to ensure that the supplied estimates for the model coefficients \(\beta\) correspond to the generated covariate vector \(z_{ki}\).
initialCoef
- a double
array containing the initial
parameter estimates.
By default the initial parameter estimates are all 0.0.public void setClassVarColumns(int[] classVarIndices)
x
that are the classification variables.classVarIndices
- an int
array containing the column numbers of
x
that are the classification variables.
By default it is assumed there are no classification variables.public void setTiesOption(int iTie)
iTie
- an int
specifying the method to be used in
handling ties. It can be either BRESLOWS_APPROXIMATE
or SORTED_AS_PER_OBSERVATIONS
.
iTie |
Method |
BRESLOWS_APPROXIMATE |
Breslow's approximate method. This is the default. |
SORTED_AS_PER_OBSERVATIONS |
Failures are assumed to occur in the same order as the
observations input in x . The observations in x
must be sorted from largest to smallest failure time within each stratum,
and grouped by stratum. All observations are treated as if their
failure/censoring times were distinct when computing the log-likelihood. |
By default, iTie
is BRESLOWS_APPROXIMATE
.
public int[] getClassValueCounts() throws ProportionalHazards.ClassificationVariableLimitException
int
array containing the number of values taken
by each classification variable.ProportionalHazards.ClassificationVariableLimitException
- is thrown if the
classification variable limit set by the user through
setUpperBound
has been exceeded.public double[] getClassValues() throws ProportionalHazards.ClassificationVariableLimitException
getClassValueCounts()
. Then the first nclval[0]
elements contain the values for the first
classification variable, the next nclval[1] elements
contain the values for the second classification variable, etc.double
array containing the values taken
by each classification variable.ProportionalHazards.ClassificationVariableLimitException
- is thrown if the
classification variable limit set by the user through
setUpperBound
has been exceeded.public int[] getStratumNumbers() throws ProportionalHazards.ClassificationVariableLimitException
stratumRatio
is not -1.0,
additional "strata" (other than those specified by column groupIndex
of x
set via the setStratumColumn
method)
may be generated. The array also contains a
record of the generated strata. See the ProportionalHazards
class description
for more detail.int
array containing the stratum number used
for each observation.ProportionalHazards.ClassificationVariableLimitException
- is thrown if the
classification variable limit set by the user through
setUpperBound
has been exceeded.public int getNumberOfCoefficients() throws ProportionalHazards.ClassificationVariableLimitException
int
scalar representing the number of estimated
coefficients in the model.ProportionalHazards.ClassificationVariableLimitException
- is thrown if the
classification variable limit set by the user through
setUpperBound
has been exceeded.public double[][] getParameterStatistics() throws ProportionalHazards.ClassificationVariableLimitException
There is one row for each coefficient, and the columns of the returned matrix contain the following:
Column | Statistic |
0 |
The coefficient estimate, \(\hat{\beta}\) |
1 |
Estimated standard deviation of the estimated coefficient |
2 |
Asymptotic normal score for testing that the coefficient is zero against the two-sided alternative |
3 |
p-value associated with the normal score in column 2 |
double
matrix containing the parameter
estimates and associated statistics.ProportionalHazards.ClassificationVariableLimitException
- is thrown if the
classification variable limit set by the user through
setUpperBound
has been exceeded.public double[][] getCaseStatistics() throws ProportionalHazards.ClassificationVariableLimitException
There is one row for each observation, and the columns of the returned matrix contain the following:
Column | Statistic |
0 |
Estimated survival probability at the observation time. |
1 |
Estimated observation influence or leverage. |
2 |
A residual estimate. |
3 |
Estimated cumulative baseline hazard rate. |
4 |
Observation proportionality constant. |
double
matrix containing the case statistics.ProportionalHazards.ClassificationVariableLimitException
- is thrown if the
classification variable limit set by the user through
setUpperBound
has been exceeded.public double[] getMeans() throws ProportionalHazards.ClassificationVariableLimitException
double
array containing the means of the design variables.ProportionalHazards.ClassificationVariableLimitException
- is thrown if the
classification variable limit set by the user through
setUpperBound
has been exceeded.public double[][] getVarianceCovarianceMatrix() throws ProportionalHazards.ClassificationVariableLimitException
double
matrix containing the estimated asymptotic
variance-covariance matrix of the parameters.ProportionalHazards.ClassificationVariableLimitException
- is thrown if the
classification variable limit set by the user through
setUpperBound
has been exceeded.public double[][] getHessian() throws ProportionalHazards.ClassificationVariableLimitException
Note that the setHessianOption
method must be invoked with
wantHessian
set to true
and the
setInitialEstimates
method
must be invoked prior to invoking this method.
IllegalStateException
exception.double
matrix containing the inverse of the Hessian
of the negative of the log-likelihood, computed at the initial estimates.ProportionalHazards.ClassificationVariableLimitException
- is thrown if the
classification variable limit set by the user through
setUpperBound
has been exceeded.public double[] getGradient() throws ProportionalHazards.ClassificationVariableLimitException
Note that the setHessianOption
method must be invoked with
wantHessian
set to true
and the
setInitialEstimates
method
must be invoked prior to invoking this method. Otherwise, the method throws
an IllegalStateException
exception.
double
array containing the inverse of the Hessian
times the gradient vector, computed at the initial estimates.ProportionalHazards.ClassificationVariableLimitException
- is thrown if the
classification variable limit set by the user through
setUpperBound
has been exceeded.public double getMaximumLikelihood() throws ProportionalHazards.ClassificationVariableLimitException
The log-likelihood is fully described in the ProportionalHazards
class description.
double
representing the maximized log-likelihoodProportionalHazards.ClassificationVariableLimitException
- is thrown if the
classification variable limit set by the user through
setUpperBound
has been exceeded.public double[] getInitialEstimates() throws ProportionalHazards.ClassificationVariableLimitException
double
array containing the initial
parameter estimates.ProportionalHazards.ClassificationVariableLimitException
- is thrown if the
classification variable limit set by the user through
setUpperBound
has been exceeded.public double[] getLastUpdates() throws ProportionalHazards.ClassificationVariableLimitException
double
array containing the last
parameter updates (excluding step halvings).ProportionalHazards.ClassificationVariableLimitException
- is thrown if the
classification variable limit set by the user through
setUpperBound
has been exceeded.public int getNumberRowsMissing() throws ProportionalHazards.ClassificationVariableLimitException
x
that contain
missing values in one or more specific columns of x
.int
scalar representing the number of rows of
data in x
that contain missing values in one or
more specific columns of x
.ProportionalHazards.ClassificationVariableLimitException
- is thrown if the
classification variable limit set by the user through
setUpperBound
has been exceeded.public int getResponseColumn()
x
containing the
response time for each observation.int
specifying the column index of x
containing the response time for each observation.public int getCensorColumn()
x
containing the
optional censoring code for each observation.int
specifying the column index of x
containing the optional censoring code for each observation.public int getFrequencyColumn()
x
containing the frequency of
response for each observation.int
specifying the column index of x
containing the frequency of
response for each observation.public int getConstantColumn()
x
containing the constant to be added
to the linear response.int
specifying the column index of x
containing the constant to be added
to the linear response.public int getStratumColumn()
x
containing the stratum number
for each observation.int
specifying the column index of x
containing the
stratum number for each observation.public double getConvergenceTol()
double
specifying the convergence
tolerance used.public int getMaxClass()
x
.int
representing the upper bound used
on the sum of the number of distinct values found among
the classification variables in x
.public int getMaxIterations()
int
specifying the maximum number of
iterations allowed.public double getStratumRatio()
double
specifying the ratio at which
a stratum is split into two strata.public int getTiesOption()
int
specifying the method to be used in
handling ties as indicated by the value in the following table:
Value |
Method |
BRESLOWS_APPROXIMATE |
Breslow's approximate method. This is the default. |
SORTED_AS_PER_OBSERVATIONS |
Failures are assumed to occur in the same order as the
observations input in x . The observations in x
must be sorted from largest to smallest failure time within each stratum,
and grouped by stratum. All observations are treated as if their
failure/censoring times were distinct when computing the log-likelihood. |
public boolean getHessianOption()
boolean
used to indicate whether or not to compute the Hessian
and gradient at the initial estimates.boolean
specifying whether or not the
Hessian and gradient are to be computed at the initial
estimates. A return value equal to
true
indicates that the
Hessian and gradient are to be computed.Copyright © 2020 Rogue Wave Software. All rights reserved.