|
JMSLTM Numerical Library 6.1 | |||||||
PREV CLASS NEXT CLASS | FRAMES NO FRAMES | |||||||
SUMMARY: NESTED | FIELD | CONSTR | METHOD | DETAIL: FIELD | CONSTR | METHOD |
java.lang.Object com.imsl.stat.ProportionalHazards
public class ProportionalHazards
Analyzes survival and reliability data using Cox's proportional hazards model.
Class ProportionalHazards
computes parameter estimates and
other statistics in Proportional Hazards Generalized Linear Models. These
models were first proposed by Cox (1972). Two methods for handling ties are
allowed. Time-dependent covariates are not allowed. The user is referred to
Cox and Oakes (1984), Kalbfleisch and Prentice (1980), Elandt-Johnson and
Johnson (1980), Lee (1980), or Lawless (1982), among other texts, for a
thorough discussion of the Cox proportional hazards model.
Let represent the hazard rate at
time t for observation number i with covariables contained as
elements of row vector . The basic assumption in the
proportional hazards model (the proportionality assumption) is that the hazard
rate can be written as a product of a time varying function
, which depends only on time, and a
function , which depends only on the covariable
values. The function used in
ProportionalHazards
is given as
where is a fixed constant assigned to the observation,
and b is a vector of coefficients to be estimated. With this function
one obtains a hazard rate
.
The form of is not important in proportional
hazards models.
The constants may be known theoretically. For
example, the hazard rate may be proportional to a known length or area, and
the can then be determined from this known length or
area. Alternatively, the may be used to fix a subset
of the coefficients (say, )
at specified values. When is used in this way,
constants are used, while the
remaining coefficients in are free to vary in the
optimization algorithm. Constants are defined as 0.0 by default. If
user-specified constants are desired, use the setConstantColumn
method to specify which column contains the constant.
With this definition of , the usual partial (or marginal, see Kalbfleisch and Prentice (1980)) likelihood becomes
where denotes the set of indices of
observations that have not yet failed at time (the
risk set), denotes the time of failure for the i-th
observation, is the total number of observations
that fail. Right-censored observations (i.e., observations that are known to
have survived to time , but for which no time of
failure is known) are incorporated into the likelihood through the risk set
. Such observations never appear in the numerator
of the likelihood. When setTieOptions
is set to
BRESLOWS_APPROXIMATE
(the default), all observations that are
censored at
time are not included in ,
while all observations that fail at time are
included in .
If it can be assumed that the dependence of the hazard rate upon the
covariate values remains the same from stratum to stratum, while the
time-dependent term, , may be different in
different strata, then ProportionalHazards
allows the
incorporation of strata into the likelihood as follows. Let k index
the m strata (set with setStratumColumn
). Then, the
likelihood is given by
In ProportionalHazards
, the log of the likelihood is
maximized with respect to the coefficients . A
quasi-Newton algorithm approximating the Hessian via the matrix of sums of
squares and cross products of the first partial derivatives is used in the
initial iterations. When the change in the log-likelihood from one iteration
to the next is less than 100 times the convergence tolerance,
Newton-Raphson iteration is used. If, during any iteration, the
initial step does not lead to an increase in the log-likelihood, then step
halving is employed to find a step that will increase the log-likelihood.
Once the maximum likelihood estimates have been computed, the algorithm computes estimates of a probability associated with each failure. Within stratum k, an estimate of the probability that the i-th observation fails at time given the risk set is given by
A diagnostic "influence" or "leverage" statistic is computed for each noncensored observation as:
where is the matrix of second partial derivatives of the log-likelihood, and is computed as:Influence statistics are not computed for censored observations.
A "residual" is computed for each of the input observations according to methods given in Cox and Oakes (1984, page 108). Residuals are computed as
where is the number of tied failures in group k at time . Assuming that the proportional hazards assumption holds, the residuals should approximate a random sample (with censoring) from the unit exponential distribution. By subtracting the expected values, centered residuals can be obtained. (The j-th expected order statistic from the unit exponential with censoring is given as
where h is the sample size, and censored observations are not included in the summation.)
An estimate of the cumulative baseline hazard within group k is given as
The observation proportionality constant is computed as
Note that the user can use the JDK JAVA Logging API to generate intermediate output for the solver. Accumulated levels of detail correspond to JAVA's FINE, FINER, and FINEST logging levels with FINE yielding the smallest amount of information and FINEST yielding the most. The levels of output yield the following:
Level | Output |
FINE | Logging is enabled, but observational statistics are not printed. |
FINER | All output statistics are printed. |
FINEST | Tracks progress through internal methods. |
Nested Class Summary | |
---|---|
static class |
ProportionalHazards.ClassificationVariableLimitException
The Classification Variable limit set by the user through
setUpperBound has been exceeded. |
Field Summary | |
---|---|
static int |
BRESLOWS_APPROXIMATE
Breslows approximate method of handling ties. |
static int |
SORTED_AS_PER_OBSERVATIONS
Failures are assumed to occur in the same order as the observations input in x . |
Constructor Summary | |
---|---|
ProportionalHazards(double[][] x,
int[] nVarEffects,
int[] indEffects)
Constructor for ProportionalHazards . |
Method Summary | |
---|---|
double[][] |
getCaseStatistics()
Returns the case statistics for each observation. |
int |
getCensorColumn()
Returns the column index of x containing the
optional censoring code for each observation. |
int[] |
getClassValueCounts()
Returns the number of values taken by each classification variable. |
double[] |
getClassValues()
Returns the class values taken by each classification variable. |
int |
getConstantColumn()
Returns the column index of x containing the constant to be added
to the linear response. |
double |
getConvergenceTol()
Returns the convergence tolerance used. |
int |
getFrequencyColumn()
Returns the column index of x containing the frequency of
response for each observation. |
double[] |
getGradient()
Returns the inverse of the Hessian times the gradient vector, computed at the initial estimates. |
double[][] |
getHessian()
Returns the inverse of the Hessian of the negative of the log-likelihood, computed at the initial estimates. |
boolean |
getHessianOption()
Returns the boolean used to indicate whether or not to compute the Hessian
and gradient at the initial estimates. |
double[] |
getInitialEstimates()
Gets the initial parameter estimates. |
double[] |
getLastUpdates()
Gets the last parameter updates. |
Logger |
getLogger()
Returns the logger object and enables logging. |
int |
getMaxClass()
Returns the upper bound used on the sum of the number of distinct values found among the classification variables in x . |
double |
getMaximumLikelihood()
Returns the maximized log-likelihood. |
int |
getMaxIterations()
Return the maximum number of iterations allowed. |
double[] |
getMeans()
Returns the means of the design variables. |
int |
getNumberOfCoefficients()
Returns the number of estimated coefficients in the model. |
int |
getNumberRowsMissing()
Returns the number of rows of data in x that contain
missing values in one or more specific columns of x . |
double[][] |
getParameterStatistics()
Returns the parameter estimates and associated statistics. |
int |
getResponseColumn()
Returns the column index of x containing the
response time for each observation. |
int |
getStratumColumn()
Returns the column index of x containing the stratum number
for each observation. |
int[] |
getStratumNumbers()
Returns the stratum number used for each observation. |
double |
getStratumRatio()
Returns the ratio at which a stratum is split into two strata. |
int |
getTiesOption()
Returns the method used for handling ties. |
double[][] |
getVarianceCovarianceMatrix()
Returns the estimated asymptotic variance-covariance matrix of the parameters. |
void |
setCensorColumn(int censorIndex)
Sets the column index of x containing the
optional censoring code for each observation. |
void |
setClassVarColumns(int[] classVarIndices)
Sets the column indices of x that are the classification variables. |
void |
setConstantColumn(int fixedIndex)
Sets the column index of x containing the constant to be added
to the linear response. |
void |
setConvergenceTol(double convergenceTol)
Set the convergence tolerance. |
void |
setFrequencyColumn(int frequencyIndex)
Sets the column index of x containing the frequency of
response for each observation. |
void |
setHessianOption(boolean wantHessian)
Set the option to have the Hessian and gradient be computed at the initial estimates. |
void |
setInitialEstimates(double[] initialCoef)
Sets the initial parameter estimates. |
void |
setMaxClass(int maxClass)
Sets an upper bound on the sum of the number of distinct values found among the classification variables in x . |
void |
setMaxIterations(int maxIterations)
Set the maximum number of iterations allowed. |
void |
setResponseColumn(int responseIndex)
Sets the column index of x containing the
response variable. |
void |
setStratumColumn(int stratumIndex)
Sets the column index of x containing the stratification
variable. |
void |
setStratumRatio(double stratumRatio)
Set the ratio at which a stratum is split into two strata. |
void |
setTiesOption(int iTie)
Sets the method for handling ties. |
Methods inherited from class java.lang.Object |
---|
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait |
Field Detail |
---|
public static final int BRESLOWS_APPROXIMATE
setTiesOption
.
public static final int SORTED_AS_PER_OBSERVATIONS
x
. The observations in x
must be sorted from largest to smallest failure time within each stratum,
and grouped by stratum. All observations are treated as if their
failure/censoring times were distinct when computing the log-likelihood.
See setTiesOption
.
Constructor Detail |
---|
public ProportionalHazards(double[][] x, int[] nVarEffects, int[] indEffects)
ProportionalHazards
.
x
- a double
matrix containing the data,
including optional data.nVarEffects
- an int
array containing the number of variables
associated with each effect in the model.indEffects
- an int
array containing the column numbers of
x
associated with each effect. The first
nVarEffects[0]
elements of indEffects
contain the column numbers of x
for the variables
in the first effect. The next nVarEffects[1]
elements of indEffects
contain the column numbers of x
for the variables
in the second effect, etc.Method Detail |
---|
public double[][] getCaseStatistics() throws ProportionalHazards.ClassificationVariableLimitException
There is one row for each observation, and the columns of the returned matrix contain the following:
Column | Statistic |
0 |
Estimated survival probability at the observation time. |
1 |
Estimated observation influence or leverage. |
2 |
A residual estimate. |
3 |
Estimated cumulative baseline hazard rate. |
4 |
Observation proportionality constant. |
double
matrix containing the case statistics.
ProportionalHazards.ClassificationVariableLimitException
- is thrown if the
classification variable limit set by the user through
setUpperBound
has been exceeded.public int getCensorColumn()
x
containing the
optional censoring code for each observation.
int
specifying the column index of x
containing the optional censoring code for each observation.public int[] getClassValueCounts() throws ProportionalHazards.ClassificationVariableLimitException
int
array containing the number of values taken
by each classification variable.
ProportionalHazards.ClassificationVariableLimitException
- is thrown if the
classification variable limit set by the user through
setUpperBound
has been exceeded.public double[] getClassValues() throws ProportionalHazards.ClassificationVariableLimitException
getClassValueCounts()
. Then the first nclval[0]
elements contain the values for the first
classification variable, the next nclval[1] elements
contain the values for the second classification variable, etc.
double
array containing the values taken
by each classification variable.
ProportionalHazards.ClassificationVariableLimitException
- is thrown if the
classification variable limit set by the user through
setUpperBound
has been exceeded.public int getConstantColumn()
x
containing the constant to be added
to the linear response.
int
specifying the column index of x
containing the constant to be added
to the linear response.public double getConvergenceTol()
double
specifying the convergence
tolerance used.public int getFrequencyColumn()
x
containing the frequency of
response for each observation.
int
specifying the column index of x
containing the frequency of
response for each observation.public double[] getGradient() throws ProportionalHazards.ClassificationVariableLimitException
Note that the setHessianOption
method must be invoked with
wantHessian
set to true
and the
setInitialEstimates
method
must be invoked prior to invoking this method. Otherwise, the method throws
an IllegalStateException
exception.
double
array containing the inverse of the Hessian
times the gradient vector, computed at the initial estimates.
ProportionalHazards.ClassificationVariableLimitException
- is thrown if the
classification variable limit set by the user through
setUpperBound
has been exceeded.public double[][] getHessian() throws ProportionalHazards.ClassificationVariableLimitException
Note that the setHessianOption
method must be invoked with
wantHessian
set to true
and the
setInitialEstimates
method
must be invoked prior to invoking this method.
IllegalStateException
exception.
double
matrix containing the inverse of the Hessian
of the negative of the log-likelihood, computed at the initial estimates.
ProportionalHazards.ClassificationVariableLimitException
- is thrown if the
classification variable limit set by the user through
setUpperBound
has been exceeded.public boolean getHessianOption()
boolean
used to indicate whether or not to compute the Hessian
and gradient at the initial estimates.
boolean
specifying whether or not the
Hessian and gradient are to be computed at the initial
estimates. A return value equal to
true
indicates that the
Hessian and gradient are to be computed.public double[] getInitialEstimates() throws ProportionalHazards.ClassificationVariableLimitException
double
array containing the initial
parameter estimates.
ProportionalHazards.ClassificationVariableLimitException
- is thrown if the
classification variable limit set by the user through
setUpperBound
has been exceeded.public double[] getLastUpdates() throws ProportionalHazards.ClassificationVariableLimitException
double
array containing the last
parameter updates (excluding step halvings).
ProportionalHazards.ClassificationVariableLimitException
- is thrown if the
classification variable limit set by the user through
setUpperBound
has been exceeded.public Logger getLogger()
java.util.logging.Logger
object, if present, or
null
.public int getMaxClass()
x
.
int
representing the upper bound used
on the sum of the number of distinct values found among
the classification variables in x
.public double getMaximumLikelihood() throws ProportionalHazards.ClassificationVariableLimitException
The log-likelihood is fully described in the ProportionalHazards
class description.
double
representing the maximized log-likelihood
ProportionalHazards.ClassificationVariableLimitException
- is thrown if the
classification variable limit set by the user through
setUpperBound
has been exceeded.public int getMaxIterations()
int
specifying the maximum number of
iterations allowed.public double[] getMeans() throws ProportionalHazards.ClassificationVariableLimitException
double
array containing the means of the design variables.
ProportionalHazards.ClassificationVariableLimitException
- is thrown if the
classification variable limit set by the user through
setUpperBound
has been exceeded.public int getNumberOfCoefficients() throws ProportionalHazards.ClassificationVariableLimitException
int
scalar representing the number of estimated
coefficients in the model.
ProportionalHazards.ClassificationVariableLimitException
- is thrown if the
classification variable limit set by the user through
setUpperBound
has been exceeded.public int getNumberRowsMissing() throws ProportionalHazards.ClassificationVariableLimitException
x
that contain
missing values in one or more specific columns of x
.
int
scalar representing the number of rows of
data in x
that contain missing values in one or
more specific columns of x
.
ProportionalHazards.ClassificationVariableLimitException
- is thrown if the
classification variable limit set by the user through
setUpperBound
has been exceeded.public double[][] getParameterStatistics() throws ProportionalHazards.ClassificationVariableLimitException
There is one row for each coefficient, and the columns of the returned matrix contain the following:
Column | Statistic |
0 |
The coefficient estimate, |
1 |
Estimated standard deviation of the estimated coefficient |
2 |
Asymptotic normal score for testing that the coefficient is zero against the two-sided alternative |
3 |
p-value associated with the normal score in column 2 |
double
matrix containing the parameter
estimates and associated statistics.
ProportionalHazards.ClassificationVariableLimitException
- is thrown if the
classification variable limit set by the user through
setUpperBound
has been exceeded.public int getResponseColumn()
x
containing the
response time for each observation.
int
specifying the column index of x
containing the response time for each observation.public int getStratumColumn()
x
containing the stratum number
for each observation.
int
specifying the column index of x
containing the
stratum number for each observation.public int[] getStratumNumbers() throws ProportionalHazards.ClassificationVariableLimitException
stratumRatio
is not -1.0,
additional "strata" (other than those specified by column groupIndex
of x
set via the setStratumColumn
method)
may be generated. The array also contains a
record of the generated strata. See the ProportionalHazards
class description
for more detail.
int
array containing the stratum number used
for each observation.
ProportionalHazards.ClassificationVariableLimitException
- is thrown if the
classification variable limit set by the user through
setUpperBound
has been exceeded.public double getStratumRatio()
double
specifying the ratio at which
a stratum is split into two strata.public int getTiesOption()
int
specifying the method to be used in
handling ties as indicated by the value in the following table:
Value |
Method |
BRESLOWS_APPROXIMATE |
Breslow's approximate method. This is the default. |
SORTED_AS_PER_OBSERVATIONS |
Failures are assumed to occur in the same order as the
observations input in x . The observations in x
must be sorted from largest to smallest failure time within each stratum,
and grouped by stratum. All observations are treated as if their
failure/censoring times were distinct when computing the log-likelihood. |
public double[][] getVarianceCovarianceMatrix() throws ProportionalHazards.ClassificationVariableLimitException
double
matrix containing the estimated asymptotic
variance-covariance matrix of the parameters.
ProportionalHazards.ClassificationVariableLimitException
- is thrown if the
classification variable limit set by the user through
setUpperBound
has been exceeded.public void setCensorColumn(int censorIndex)
x
containing the
optional censoring code for each observation.
If x[i][censorIndex]
equals 0, the failure time
x[i][responseIndex]
is treated as an exact time of failure.
Otherwise, it is treated as right-censored time.
By default, it is assumed that there is no censor code column in
x
and all observations are assumed to be exact failure
times.
censorIndex
- an int
specifying the column index of x
containing the optional censoring code for each observation.public void setClassVarColumns(int[] classVarIndices)
x
that are the classification variables.
classVarIndices
- an int
array containing the column numbers of
x
that are the classification variables.
By default it is assumed there are no classification variables.public void setConstantColumn(int fixedIndex)
x
containing the constant to be added
to the linear response.
The linear response is taken to be where is the observation constant, is the observation design row vector, and is the vector of estimated parameters. The "fixed" constant allows one to test hypotheses about parameters via the log-likelihoods. If this method is not called, it is assumed that for all observations.
fixedIndex
- an int
specifying the column index of x
containing the constant to be added to the linear response.public void setConvergenceTol(double convergenceTol)
Convergence is assumed when the relative change in
the maximum likelihood from one iteration to the next is
less than convergenceTol
. If convergenceTol
is zero, convergenceTol
= 0.0001 is assumed.
The default value is 0.0001.
convergenceTol
- a double
specifying the convergence
tolerance.public void setFrequencyColumn(int frequencyIndex)
x
containing the frequency of
response for each observation.
By default it is assumed that there is no
frequency response column recorded in x
.
Each observation in the data array is assumed to be for a
single failure; that is, the frequency of response for each observation is 1.
frequencyIndex
- an int
specifying the column index of x
containing the frequency of
response for each observation.public void setHessianOption(boolean wantHessian)
wantHessian
- a boolean
specifying whether or not the
Hessian and gradient are to be computed at the initial
estimates. If this option is set to true
the
user must set the initial estimates via the setInitialEstimates
method.
By default the Hessian and gradient are not computed at the initial estimates.public void setInitialEstimates(double[] initialCoef)
Care should be taken to ensure that the supplied estimates for the model coefficients correspond to the generated covariate vector .
initialCoef
- a double
array containing the initial
parameter estimates.
By default the initial parameter estimates are all 0.0.public void setMaxClass(int maxClass)
x
.
For example,
if the model consisted of of two class variables, one with the
values {1, 2, 3, 4} and a second with the values {0, 1}, then
the total number of different classification values is
4 + 2 = 6, and maxClass
.
The default value is the number of observations
in x
.
maxClass
- an int
representing an upper bound
on the sum of the number of distinct values found among
the classification variables in x
.public void setMaxIterations(int maxIterations)
maxIterations
- an int
specifying the maximum number of
iterations allowed.
The default value is 30.public void setResponseColumn(int responseIndex)
x
containing the
response variable.
For point observations,
x[i][responseIndex]
contains the time of the i-th event. For right-censored
observations, x[i][responseIndex]
contains the right-censoring time. Note that because
ProportionalHazards
only uses the order of the
events, negative "times" are allowed.
By default responseIndex = 0
.
responseIndex
- an int
specifying the column index of x
containing the response variable.public void setStratumColumn(int stratumIndex)
x
containing the stratification
variable.
Column stratumIndex
of
x
contains a unique value for each stratum in the data.
The risk set for an observation is determined by its stratum.
By default it is assumed that all obvservations
are from one statum.
stratumIndex
- an int
specifying the column index of x
containing the
stratification variable.public void setStratumRatio(double stratumRatio)
Let
be the observation proportionality constant, where is the design row vector for the k-th observation and is the optional fixed parameter specified by . Let be the minimum value in a stratum, where, for failed observations, the minimum is over all times less than or equal to the time of occurrence of the k-th observation. Let be the maximum value of for the remaining observations in the group. Then, ifstratumRatio
,
the observations in the group are divided into two groups
at k. The default value of
stratumRatio
= 1000 is usually good.
Set stratumRatio
to any negative value
if no division into strata is to be made.
stratumRatio
- a double
specifying the ratio at which
a stratum is split into two strata.public void setTiesOption(int iTie)
iTie
- an int
specifying the method to be used in
handling ties. It can be either BRESLOWS_APPROXIMATE
or SORTED_AS_PER_OBSERVATIONS
.
iTie |
Method |
BRESLOWS_APPROXIMATE |
Breslow's approximate method. This is the default. |
SORTED_AS_PER_OBSERVATIONS |
Failures are assumed to occur in the same order as the
observations input in x . The observations in x
must be sorted from largest to smallest failure time within each stratum,
and grouped by stratum. All observations are treated as if their
failure/censoring times were distinct when computing the log-likelihood. |
By default, iTie
is BRESLOWS_APPROXIMATE
.
|
JMSLTM Numerical Library 6.1 | |||||||
PREV CLASS NEXT CLASS | FRAMES NO FRAMES | |||||||
SUMMARY: NESTED | FIELD | CONSTR | METHOD | DETAIL: FIELD | CONSTR | METHOD |