public class ProportionalHazards extends Object implements Serializable, Cloneable
Class ProportionalHazards
computes parameter estimates and
other statistics in Proportional Hazards Generalized Linear Models. These
models were first proposed by Cox (1972). Two methods for handling ties are
allowed. Timedependent covariates are not allowed. The user is referred to
Cox and Oakes (1984), Kalbfleisch and Prentice (1980), ElandtJohnson and
Johnson (1980), Lee (1980), or Lawless (1982), among other texts, for a
thorough discussion of the Cox proportional hazards model.
Let represent the hazard rate at
time t for observation number i with covariables contained as
elements of row vector . The basic assumption in the
proportional hazards model (the proportionality assumption) is that the hazard
rate can be written as a product of a time varying function
, which depends only on time, and a
function , which depends only on the covariable
values. The function used in
ProportionalHazards
is given as
where is a fixed constant assigned to the observation,
and b is a vector of coefficients to be estimated. With this function
one obtains a hazard rate
.
The form of is not important in proportional
hazards models.
The constants may be known theoretically. For
example, the hazard rate may be proportional to a known length or area, and
the can then be determined from this known length or
area. Alternatively, the may be used to fix a subset
of the coefficients (say, )
at specified values. When is used in this way,
constants are used, while the
remaining coefficients in are free to vary in the
optimization algorithm. Constants are defined as 0.0 by default. If
userspecified constants are desired, use the setConstantColumn
method to specify which column contains the constant.
With this definition of , the usual partial (or marginal, see Kalbfleisch and Prentice (1980)) likelihood becomes
where denotes the set of indices of
observations that have not yet failed at time (the
risk set), denotes the time of failure for the ith
observation, is the total number of observations
that fail. Rightcensored observations (i.e., observations that are known to
have survived to time , but for which no time of
failure is known) are incorporated into the likelihood through the risk set
. Such observations never appear in the numerator
of the likelihood. When setTieOptions
is set to
BRESLOWS_APPROXIMATE
(the default), all observations that are
censored at
time are not included in ,
while all observations that fail at time are
included in .
If it can be assumed that the dependence of the hazard rate upon the
covariate values remains the same from stratum to stratum, while the
timedependent term, , may be different in
different strata, then ProportionalHazards
allows the
incorporation of strata into the likelihood as follows. Let k index
the m strata (set with setStratumColumn
). Then, the
likelihood is given by
In ProportionalHazards
, the log of the likelihood is
maximized with respect to the coefficients . A
quasiNewton algorithm approximating the Hessian via the matrix of sums of
squares and cross products of the first partial derivatives is used in the
initial iterations. When the change in the loglikelihood from one iteration
to the next is less than 100 times the convergence tolerance,
NewtonRaphson iteration is used. If, during any iteration, the
initial step does not lead to an increase in the loglikelihood, then step
halving is employed to find a step that will increase the loglikelihood.
Once the maximum likelihood estimates have been computed, the algorithm computes estimates of a probability associated with each failure. Within stratum k, an estimate of the probability that the ith observation fails at time given the risk set is given by
A diagnostic "influence" or "leverage" statistic is computed for each noncensored observation as:
where is the matrix of second partial derivatives of the loglikelihood, and is computed as:Influence statistics are not computed for censored observations.
A "residual" is computed for each of the input observations according to methods given in Cox and Oakes (1984, page 108). Residuals are computed as
where is the number of tied failures in group k at time . Assuming that the proportional hazards assumption holds, the residuals should approximate a random sample (with censoring) from the unit exponential distribution. By subtracting the expected values, centered residuals can be obtained. (The jth expected order statistic from the unit exponential with censoring is given as
where h is the sample size, and censored observations are not included in the summation.)
An estimate of the cumulative baseline hazard within group k is given as
The observation proportionality constant is computed as
Note that the user can use the JDK JAVA Logging API to generate intermediate output for the solver. Accumulated levels of detail correspond to JAVA's FINE, FINER, and FINEST logging levels with FINE yielding the smallest amount of information and FINEST yielding the most. The levels of output yield the following:
Level  Output 
FINE  Logging is enabled, but observational statistics are not printed. 
FINER  All output statistics are printed. 
FINEST  Tracks progress through internal methods. 
Modifier and Type  Class and Description 

static class 
ProportionalHazards.ClassificationVariableLimitException
The Classification Variable limit set by the user through
setUpperBound has been exceeded. 
Modifier and Type  Field and Description 

static int 
BRESLOWS_APPROXIMATE
Breslows approximate method of handling ties.

static int 
SORTED_AS_PER_OBSERVATIONS
Failures are assumed to occur in the same order as the
observations input in
x . 
Constructor and Description 

ProportionalHazards(double[][] x,
int[] nVarEffects,
int[] indEffects)
Constructor for
ProportionalHazards . 
Modifier and Type  Method and Description 

double[][] 
getCaseStatistics()
Returns the case statistics for each observation.

int 
getCensorColumn()
Returns the column index of
x containing the
optional censoring code for each observation. 
int[] 
getClassValueCounts()
Returns the number of values taken by each classification variable.

double[] 
getClassValues()
Returns the class values taken by each classification variable.

int 
getConstantColumn()
Returns the column index of
x containing the constant to be added
to the linear response. 
double 
getConvergenceTol()
Returns the convergence tolerance used.

int 
getFrequencyColumn()
Returns the column index of
x containing the frequency of
response for each observation. 
double[] 
getGradient()
Returns the inverse of the Hessian times the gradient vector,
computed at the initial estimates.

double[][] 
getHessian()
Returns the inverse of the Hessian of the negative of the loglikelihood,
computed at the initial estimates.

boolean 
getHessianOption()
Returns the
boolean used to indicate whether or not to compute the Hessian
and gradient at the initial estimates. 
double[] 
getInitialEstimates()
Gets the initial parameter estimates.

double[] 
getLastUpdates()
Gets the last parameter updates.

Logger 
getLogger()
Returns the logger object and enables logging.

int 
getMaxClass()
Returns the upper bound used on the sum of the number of distinct values found among
the classification variables in
x . 
double 
getMaximumLikelihood()
Returns the maximized loglikelihood.

int 
getMaxIterations()
Return the maximum number of iterations allowed.

double[] 
getMeans()
Returns the means of the design variables.

int 
getNumberOfCoefficients()
Returns the number of estimated coefficients in the model.

int 
getNumberRowsMissing()
Returns the number of rows of data in
x that contain
missing values in one or more specific columns of x . 
double[][] 
getParameterStatistics()
Returns the parameter estimates and associated statistics.

int 
getResponseColumn()
Returns the column index of
x containing the
response time for each observation. 
int 
getStratumColumn()
Returns the column index of
x containing the stratum number
for each observation. 
int[] 
getStratumNumbers()
Returns the stratum number used for each observation.

double 
getStratumRatio()
Returns the ratio at which a stratum is split into two strata.

int 
getTiesOption()
Returns the method used for handling ties.

double[][] 
getVarianceCovarianceMatrix()
Returns the estimated asymptotic variancecovariance matrix of the parameters.

void 
setCensorColumn(int censorIndex)
Sets the column index of
x containing the
optional censoring code for each observation. 
void 
setClassVarColumns(int[] classVarIndices)
Sets the column indices of
x that are the classification variables. 
void 
setConstantColumn(int fixedIndex)
Sets the column index of
x containing the constant to be added
to the linear response. 
void 
setConvergenceTol(double convergenceTol)
Set the convergence tolerance.

void 
setFrequencyColumn(int frequencyIndex)
Sets the column index of
x containing the frequency of
response for each observation. 
void 
setHessianOption(boolean wantHessian)
Set the option to have the Hessian and gradient be computed at the initial estimates.

void 
setInitialEstimates(double[] initialCoef)
Sets the initial parameter estimates.

void 
setMaxClass(int maxClass)
Sets an upper bound on the sum of the number of distinct values found among
the classification variables in
x . 
void 
setMaxIterations(int maxIterations)
Set the maximum number of iterations allowed.

void 
setResponseColumn(int responseIndex)
Sets the column index of
x containing the
response variable. 
void 
setStratumColumn(int stratumIndex)
Sets the column index of
x containing the stratification
variable. 
void 
setStratumRatio(double stratumRatio)
Set the ratio at which a stratum is split into two strata.

void 
setTiesOption(int iTie)
Sets the method for handling ties.

public static final int BRESLOWS_APPROXIMATE
setTiesOption
.public static final int SORTED_AS_PER_OBSERVATIONS
x
. The observations in x
must be sorted from largest to smallest failure time within each stratum,
and grouped by stratum. All observations are treated as if their
failure/censoring times were distinct when computing the loglikelihood.
See setTiesOption
.public ProportionalHazards(double[][] x, int[] nVarEffects, int[] indEffects)
ProportionalHazards
.x
 a double
matrix containing the data,
including optional data.nVarEffects
 an int
array containing the number of variables
associated with each effect in the model.indEffects
 an int
array containing the column numbers of
x
associated with each effect. The first
nVarEffects[0]
elements of indEffects
contain the column numbers of x
for the variables
in the first effect. The next nVarEffects[1]
elements of indEffects
contain the column numbers of x
for the variables
in the second effect, etc.public double[][] getCaseStatistics() throws ProportionalHazards.ClassificationVariableLimitException
There is one row for each observation, and the columns of the returned matrix contain the following:
Column  Statistic 
0 
Estimated survival probability at the observation time. 
1 
Estimated observation influence or leverage. 
2 
A residual estimate. 
3 
Estimated cumulative baseline hazard rate. 
4 
Observation proportionality constant. 
double
matrix containing the case statistics.ProportionalHazards.ClassificationVariableLimitException
 is thrown if the
classification variable limit set by the user through
setUpperBound
has been exceeded.public int getCensorColumn()
x
containing the
optional censoring code for each observation.int
specifying the column index of x
containing the optional censoring code for each observation.public int[] getClassValueCounts() throws ProportionalHazards.ClassificationVariableLimitException
int
array containing the number of values taken
by each classification variable.ProportionalHazards.ClassificationVariableLimitException
 is thrown if the
classification variable limit set by the user through
setUpperBound
has been exceeded.public double[] getClassValues() throws ProportionalHazards.ClassificationVariableLimitException
getClassValueCounts()
. Then the first nclval[0]
elements contain the values for the first
classification variable, the next nclval[1] elements
contain the values for the second classification variable, etc.double
array containing the values taken
by each classification variable.ProportionalHazards.ClassificationVariableLimitException
 is thrown if the
classification variable limit set by the user through
setUpperBound
has been exceeded.public int getConstantColumn()
x
containing the constant to be added
to the linear response.int
specifying the column index of x
containing the constant to be added
to the linear response.public double getConvergenceTol()
double
specifying the convergence
tolerance used.public int getFrequencyColumn()
x
containing the frequency of
response for each observation.int
specifying the column index of x
containing the frequency of
response for each observation.public double[] getGradient() throws ProportionalHazards.ClassificationVariableLimitException
Note that the setHessianOption
method must be invoked with
wantHessian
set to true
and the
setInitialEstimates
method
must be invoked prior to invoking this method. Otherwise, the method throws
an IllegalStateException
exception.
double
array containing the inverse of the Hessian
times the gradient vector, computed at the initial estimates.ProportionalHazards.ClassificationVariableLimitException
 is thrown if the
classification variable limit set by the user through
setUpperBound
has been exceeded.public double[][] getHessian() throws ProportionalHazards.ClassificationVariableLimitException
Note that the setHessianOption
method must be invoked with
wantHessian
set to true
and the
setInitialEstimates
method
must be invoked prior to invoking this method.
IllegalStateException
exception.double
matrix containing the inverse of the Hessian
of the negative of the loglikelihood, computed at the initial estimates.ProportionalHazards.ClassificationVariableLimitException
 is thrown if the
classification variable limit set by the user through
setUpperBound
has been exceeded.public boolean getHessianOption()
boolean
used to indicate whether or not to compute the Hessian
and gradient at the initial estimates.boolean
specifying whether or not the
Hessian and gradient are to be computed at the initial
estimates. A return value equal to
true
indicates that the
Hessian and gradient are to be computed.public double[] getInitialEstimates() throws ProportionalHazards.ClassificationVariableLimitException
double
array containing the initial
parameter estimates.ProportionalHazards.ClassificationVariableLimitException
 is thrown if the
classification variable limit set by the user through
setUpperBound
has been exceeded.public double[] getLastUpdates() throws ProportionalHazards.ClassificationVariableLimitException
double
array containing the last
parameter updates (excluding step halvings).ProportionalHazards.ClassificationVariableLimitException
 is thrown if the
classification variable limit set by the user through
setUpperBound
has been exceeded.public Logger getLogger()
java.util.logging.Logger
object, if present, or
null
.public int getMaxClass()
x
.int
representing the upper bound used
on the sum of the number of distinct values found among
the classification variables in x
.public double getMaximumLikelihood() throws ProportionalHazards.ClassificationVariableLimitException
The loglikelihood is fully described in the ProportionalHazards
class description.
double
representing the maximized loglikelihoodProportionalHazards.ClassificationVariableLimitException
 is thrown if the
classification variable limit set by the user through
setUpperBound
has been exceeded.public int getMaxIterations()
int
specifying the maximum number of
iterations allowed.public double[] getMeans() throws ProportionalHazards.ClassificationVariableLimitException
double
array containing the means of the design variables.ProportionalHazards.ClassificationVariableLimitException
 is thrown if the
classification variable limit set by the user through
setUpperBound
has been exceeded.public int getNumberOfCoefficients() throws ProportionalHazards.ClassificationVariableLimitException
int
scalar representing the number of estimated
coefficients in the model.ProportionalHazards.ClassificationVariableLimitException
 is thrown if the
classification variable limit set by the user through
setUpperBound
has been exceeded.public int getNumberRowsMissing() throws ProportionalHazards.ClassificationVariableLimitException
x
that contain
missing values in one or more specific columns of x
.int
scalar representing the number of rows of
data in x
that contain missing values in one or
more specific columns of x
.ProportionalHazards.ClassificationVariableLimitException
 is thrown if the
classification variable limit set by the user through
setUpperBound
has been exceeded.public double[][] getParameterStatistics() throws ProportionalHazards.ClassificationVariableLimitException
There is one row for each coefficient, and the columns of the returned matrix contain the following:
Column  Statistic 
0 
The coefficient estimate, 
1 
Estimated standard deviation of the estimated coefficient 
2 
Asymptotic normal score for testing that the coefficient is zero against the twosided alternative 
3 
pvalue associated with the normal score in column 2 
double
matrix containing the parameter
estimates and associated statistics.ProportionalHazards.ClassificationVariableLimitException
 is thrown if the
classification variable limit set by the user through
setUpperBound
has been exceeded.public int getResponseColumn()
x
containing the
response time for each observation.int
specifying the column index of x
containing the response time for each observation.public int getStratumColumn()
x
containing the stratum number
for each observation.int
specifying the column index of x
containing the
stratum number for each observation.public int[] getStratumNumbers() throws ProportionalHazards.ClassificationVariableLimitException
stratumRatio
is not 1.0,
additional "strata" (other than those specified by column groupIndex
of x
set via the setStratumColumn
method)
may be generated. The array also contains a
record of the generated strata. See the ProportionalHazards
class description
for more detail.int
array containing the stratum number used
for each observation.ProportionalHazards.ClassificationVariableLimitException
 is thrown if the
classification variable limit set by the user through
setUpperBound
has been exceeded.public double getStratumRatio()
double
specifying the ratio at which
a stratum is split into two strata.public int getTiesOption()
int
specifying the method to be used in
handling ties as indicated by the value in the following table:
Value 
Method 
BRESLOWS_APPROXIMATE 
Breslow's approximate method. This is the default. 
SORTED_AS_PER_OBSERVATIONS 
Failures are assumed to occur in the same order as the
observations input in x . The observations in x
must be sorted from largest to smallest failure time within each stratum,
and grouped by stratum. All observations are treated as if their
failure/censoring times were distinct when computing the loglikelihood. 
public double[][] getVarianceCovarianceMatrix() throws ProportionalHazards.ClassificationVariableLimitException
double
matrix containing the estimated asymptotic
variancecovariance matrix of the parameters.ProportionalHazards.ClassificationVariableLimitException
 is thrown if the
classification variable limit set by the user through
setUpperBound
has been exceeded.public void setCensorColumn(int censorIndex)
x
containing the
optional censoring code for each observation.
If x[i][censorIndex]
equals 0, the failure time
x[i][responseIndex]
is treated as an exact time of failure.
Otherwise, it is treated as rightcensored time.
By default, it is assumed that there is no censor code column in
x
and all observations are assumed to be exact failure
times.
censorIndex
 an int
specifying the column index of x
containing the optional censoring code for each observation.public void setClassVarColumns(int[] classVarIndices)
x
that are the classification variables.classVarIndices
 an int
array containing the column numbers of
x
that are the classification variables.
By default it is assumed there are no classification variables.public void setConstantColumn(int fixedIndex)
x
containing the constant to be added
to the linear response.
The linear response is taken to be where is the observation constant, is the observation design row vector, and is the vector of estimated parameters. The "fixed" constant allows one to test hypotheses about parameters via the loglikelihoods. If this method is not called, it is assumed that for all observations.
fixedIndex
 an int
specifying the column index of x
containing the constant to be added to the linear response.public void setConvergenceTol(double convergenceTol)
Convergence is assumed when the relative change in
the maximum likelihood from one iteration to the next is
less than convergenceTol
. If convergenceTol
is zero, convergenceTol
= 0.0001 is assumed.
The default value is 0.0001.
convergenceTol
 a double
specifying the convergence
tolerance.public void setFrequencyColumn(int frequencyIndex)
x
containing the frequency of
response for each observation.
By default it is assumed that there is no
frequency response column recorded in x
.
Each observation in the data array is assumed to be for a
single failure; that is, the frequency of response for each observation is 1.
frequencyIndex
 an int
specifying the column index of x
containing the frequency of
response for each observation.public void setHessianOption(boolean wantHessian)
wantHessian
 a boolean
specifying whether or not the
Hessian and gradient are to be computed at the initial
estimates. If this option is set to true
the
user must set the initial estimates via the setInitialEstimates
method.
By default the Hessian and gradient are not computed at the initial estimates.public void setInitialEstimates(double[] initialCoef)
Care should be taken to ensure that the supplied estimates for the model coefficients correspond to the generated covariate vector .
initialCoef
 a double
array containing the initial
parameter estimates.
By default the initial parameter estimates are all 0.0.public void setMaxClass(int maxClass)
x
.
For example,
if the model consisted of of two class variables, one with the
values {1, 2, 3, 4} and a second with the values {0, 1}, then
the total number of different classification values is
4 + 2 = 6, and maxClass
.
The default value is the number of observations
in x
.
maxClass
 an int
representing an upper bound
on the sum of the number of distinct values found among
the classification variables in x
.public void setMaxIterations(int maxIterations)
maxIterations
 an int
specifying the maximum number of
iterations allowed.
The default value is 30.public void setResponseColumn(int responseIndex)
x
containing the
response variable.
For point observations,
x[i][responseIndex]
contains the time of the ith event. For rightcensored
observations, x[i][responseIndex]
contains the rightcensoring time. Note that because
ProportionalHazards
only uses the order of the
events, negative "times" are allowed.
By default responseIndex = 0
.
responseIndex
 an int
specifying the column index of x
containing the response variable.public void setStratumColumn(int stratumIndex)
x
containing the stratification
variable.
Column stratumIndex
of
x
contains a unique value for each stratum in the data.
The risk set for an observation is determined by its stratum.
By default it is assumed that all obvservations
are from one statum.
stratumIndex
 an int
specifying the column index of x
containing the
stratification variable.public void setStratumRatio(double stratumRatio)
Let
be the observation proportionality constant, where is the design row vector for the kth observation and is the optional fixed parameter specified by . Let be the minimum value in a stratum, where, for failed observations, the minimum is over all times less than or equal to the time of occurrence of the kth observation. Let be the maximum value of for the remaining observations in the group. Then, ifstratumRatio
,
the observations in the group are divided into two groups
at k. The default value of
stratumRatio
= 1000 is usually good.
Set stratumRatio
to any negative value
if no division into strata is to be made.
stratumRatio
 a double
specifying the ratio at which
a stratum is split into two strata.public void setTiesOption(int iTie)
iTie
 an int
specifying the method to be used in
handling ties. It can be either BRESLOWS_APPROXIMATE
or SORTED_AS_PER_OBSERVATIONS
.
iTie 
Method 
BRESLOWS_APPROXIMATE 
Breslow's approximate method. This is the default. 
SORTED_AS_PER_OBSERVATIONS 
Failures are assumed to occur in the same order as the
observations input in x . The observations in x
must be sorted from largest to smallest failure time within each stratum,
and grouped by stratum. All observations are treated as if their
failure/censoring times were distinct when computing the loglikelihood. 
By default, iTie
is BRESLOWS_APPROXIMATE
.
Copyright © 19702015 Rogue Wave Software
Built October 13 2015.