nonparamHazardRate

Performs nonparametric hazard rate estimation using kernel functions and quasi-likelihoods.

Synopsis

nonparamHazardRate (t, nHazard, hazardMin, hazardIncrement)

Required Arguments

float t[] (Input)
An array of nObservations containing the failure times. If optional argument censorCodes is used, the values of t may be treated as exact failure times, as right-censored times, or a combination of exact and right censored times. By default, all times in t are assumed to be exact failure times.
int nHazard (Input)
Number of grid points at which to compute the hazard. The function computes the hazard rates over the range given by: hazardMin + j * hazardIncrement, for j = 0, …, nHazard - 1.
float hazardMin (Input)
First grid value.
float hazardIncrement (Input)
Increment between grid values.

Return Value

An array of length nHazard containing the estimated hazard rates.

Optional Arguments

printLevel, int (Input)

Printing option.

printLevel Action
0 No printing is performed.
1 The grid estimates and the optimized estimates are printed for each value of k.

Default: printLevel = 0.

censorCodes, int[] (Input)

censorCodes is an array of length nObservations containing the censoring codes for each time in t. If censorCodes[i]=0 the failure time t[i] is treated as an exact time of failure. Otherwise it is treated as a right-censored time; that is, the exact time of failure is greater than t[i].

Default: All failure times are treated as exact times of failure with no censoring.

weight, int (Input)

Weight option. If weight = 1, then \(\mathrm{weight}=\ln( 1+1/(\mathrm{nObservations}-i))\) is used for the i‑th smallest observation. Otherwise, \(\mathrm{weight}=1/(\mathrm{nObservations}-i)\) is used.

Default: weight = 0.

sortOption, int (Input)

Sorting option. If sortOption = 1, then the event times are not automatically sorted by the function. Otherwise, sorting is performed with exact failure times following tied right-censored times.

Default: sortOption = 0.

kGrid, int nK, int kMin, int kIncrement (Input)

Finds the optimal value of k over the range given by: kmin + (j - 1) × kIncrement, for j = 1, …, nK. Where nK is the number of values of k to be considered. kMin is the minimum value for parameter k. kIncrement is the increment between successive values of parameter k. Parameter k is the number of nearest neighbors to be used in computing the k-th nearest neighbor distance.

Default: kMin is the smallest possible value of k, kIncrement =2, and nK will be at most 10 points.

betaGrid, int nBetaGrid, float betaStart, float betaIncrement (Input)

For nBetaGrid > 0, a user-defined grid is used. This grid is defined as betaStart + (j ‑ 1)*betaIncrement, for j = 1, …, nBetaGrid. betaStart is the first value to be used in the user-defined grid and betaIncrement is the increment between successive grid values of beta.

Default: The values in the initial beta search are given as follows:

Let \(\beta^*\) = - 8, - 4, - 2, - 1, - 0.5,0.5,1, and 2, and

\[\beta = e ^{-\beta^*}\]

For each value of β, criterion is computed at the optimizing β. The maximizing β is used to initiate the iterations. If the initial \(\beta^*\) is determined from the search to be less than -6, then it is presumed that β is infinite, and an analytic estimate of α based upon infinite β is used. Infinite β corresponds to a flat hazard rate.

nMissing (Output)
Number of missing (NaN, not a number) failure times in t.
alpha (Output)
Optimal estimate for the parameter α.
beta (Output)
Optimal estimate for the parameter β.
criterion (Output)
Optimum value of the criterion function.
k (Output)
Optimal estimate for the parameter k.
sortedEventTimes (Output)
An array of length nObservations containing the times of occurrence of the events, sorted from smallest to largest.
sortedCensorCodes (Output)
An array of length nObservations containing the sorted censor codes. Censor codes are sorted corresponding to the events sortedEventTimes[i], with censored observations preceding tied failures.

Description

Function nonparamHazardRate is an implementation of the methods discussed by Tanner and Wong (1984) for estimating the hazard rate in survival or reliability data with right censoring. It uses the biweight kernel,

\[\begin{split}K(x) = \begin{cases} \tfrac{15}{16}\left(1 - x^2\right)^2 & \text{for } |x| < 1 \\ 0 & \text{elsewhere} \end{cases}\end{split}\]

and a modified likelihood to obtain data-based estimates of the smoothing parameters α, β, and k needed in the estimation of the hazard rate. For kernel \(K(x)\), define the “smoothed” kernel \(K_s(x-x_{(j)})\) as follows:

\[K_s(x) = \left(x - x_{(j)}\right) = \frac{1}{\alpha d_{jk}} K\left(\frac{x - x_{(j)}}{\beta d_{jk}}\right)\]

where \(d_{jk}\) is the distance to the k-th nearest failure from \(x_{(j)}\), and \(x_{(j)}\) is the j-th ordered observation (from smallest to largest). For given α and β, the hazard at point x is then

\[h(x) = \sum_{i=1}^N\left\{\left(1-\delta_i\right)w_iK_s \left(x - x_{(i)}\right)\right\}\]

where N = nObservations, \(\delta_i\) is the i-th observation’s censor code (1 = censored, 0 = failed), and \(w_i\) is the i-th ordered observation’s weight, which may be chosen as either \(1/(N-i+1)\), or \(\ln(1+1/(N-i+1))\). Let

\[H(x) = \int_0^x h(s) ds\]

The likelihood is given by

\[L = \prod\nolimits_{i=1}^{N} \left\{h\left(x_i\right)^{\left(1-\delta_i\right)} \exp \left(-H\left(x_{(i)}\right)\right)\right\}\]

where ∏ denotes product. Since the likelihood leads to degenerate estimates, Tanner and Wong (1984) suggest the use of a modified likelihood. The modification consists of deleting observation \(x_i\) in the calculation of \(h(x_i)\) and \(H(x_i)\) when the likelihood term for \(x_i\) is computed using the usual optimization techniques. α and β for given k can then be estimated.

Estimates for α and β are computed as follows: for given β, a closed form solution is available for α. The problem is thus reduced to the estimation of β.

A grid search for β is first performed. Experience indicates that if the initial estimate of β from this grid search is greater than, say, \(e^6\),then the modified likelihood is degenerate because the hazard rate does not change with time. In this situation, β should be taken to be infinite, and an estimate of α corresponding to infinite β should be directly computed. When the estimate of β from the grid search is less than \(e^6\), a secant algorithm is used to optimize the modified likelihood. The secant algorithm iteration stops when the change in β from one iteration to the next is less than \(10^{-5}\). Alternatively, the iterations may cease when the value of β becomes greater than \(e^6\), at which point an infinite β with a degenerate likelihood is assumed.

To find the optimum value of the likelihood with respect to k, a user-specified grid of k-values is used. For each grid value, the modified likelihood is optimized with respect to α and β. That grid point, which leads to the smallest likelihood, is taken to be the optimal k.

Programming Notes

  1. If sorting of the data is performed by nonparamHazardRate, then the sorted array will be such that all censored observations at a given time precede all failures at that time. To specify an arbitrary pattern of censored/failed observations at a given time point, the sortOption = 1 option must be used. In this case, it is assumed that the times have already been sorted from smallest to largest.
  2. The smallest value of k must be greater than the largest number of tied failures since \(d_{jk}\) must be positive for all j. (Censored observations are not counted.) Similarly, the largest value of k must be less than the total number of failures. If the grid specified for k includes values outside the allowable range, then a warning error is issued; but k is still optimized over the allowable grid values.
  3. The secant algorithm iterates on the transformed parameter \(\beta^*= exp(-\beta)\). This assures a positive β, and it also seems to lead to a more desirable grid search. All results returned to the user are in the original parameterization, however.
  4. Since local minimums have been observed in the modified likelihood, it is recommended that more than one grid of initial values for α and β be used.
  5. Function nonparamHazardRate assumes that the hazard grid points are new data points.

Example

The following example is taken from Tanner and Wong (1984). The data are from Stablein, Carter, and Novak (1981) and involve the survival times of individuals with nonresectable gastric carcinoma. Only individuals treated with both radiation and chemotherapy are used. For each value of k from 18 to 22 with increment of 2, the default grid search for β is performed. Using the optimal value of β in the grid, the optimal parameter estimates of α and β are computed for each value of k. The final solution is the parameter estimates for the value of k which optimizes the modified likelihood (criterion). Because the printLevel = 1 is in effect, nonparamHazardRate prints all of the results in the output.

from __future__ import print_function
from numpy import *
from pyimsl.stat.nonparamHazardRate import nonparamHazardRate
from pyimsl.stat.writeMatrix import writeMatrix

iprint = 1
kmin = 18
increment_k = 2
n_k = 3
isort = 1
isorted_censor = []
event_times = []
haz = []
n_hazard = 100
hazard_min = 0.0
hazard_inc = 10.0
nmiss = []

t = array([17.0, 42.0, 44.0, 48.0, 60.0, 72.0, 74.0, 95.0,
           103.0, 108.0, 122.0, 144.0, 167.0, 170.0, 183.0,
           185.0, 193.0, 195.0, 197.0, 208.0, 234.0, 235.0,
           254.0, 307.0, 315.0, 401.0, 445.0, 464.0, 484.0,
           528.0, 542.0, 567.0, 577.0, 580.0, 795.0, 855.0,
           882.0, 892.0, 1031.0, 1033.0, 1306.0, 1335.0, 1366.0,
           1452.0, 1472.0])
censor_codes = array([0, 0, 0, 0, 0, 0, 0, 0, 0,
                      0, 0, 0, 0, 0, 0, 0, 0, 0,
                      0, 0, 0, 0, 0, 0, 0, 0, 0,
                      0, 0, 0, 0, 0, 0, 0, 0, 0,
                      1, 1, 1, 1, 1, 1, 1, 1, 1])

k_grid = {'nK': n_k, 'kMin': kmin, 'kIncrement': increment_k}
haz = nonparamHazardRate(t, n_hazard, hazard_min, hazard_inc,
                         kGrid=k_grid,
                         printLevel=iprint,
                         nMissing=nmiss,
                         sortOption=isort,
                         censorCodes=censor_codes,
                         sortedEventTimes=event_times,
                         sortedCensorCodes=isorted_censor)

print("\nnmiss = %d\n" % nmiss[0])
writeMatrix("Sorted Event Times", event_times,
            writeFormat="%7.1f")
writeMatrix("Sorted Censors", isorted_censor, writeFormat="%2i")
writeMatrix("Hazard Rates", haz)

Output

                 *** Grid search for k =    18 ***
         alpha                    beta                   vml
         4.57832                 2980.96              -266.805
         4.54312                 54.5982               -266.62
         4.33646                 20.0855              -265.541
         4.01934                 12.1825              -264.001
         3.54274                 7.38906               -262.54
         2.99058                 4.48169              -262.512
         2.35153                 2.71828              -262.634
         1.58417                 1.64872              -262.158
        0.966331                       1              -262.868

                 *** Optimal parameter estimates ***
         alpha                    beta                   vml
         1.69514                 1.76926              -262.118

                 *** Grid search for k =    20 ***
         alpha                    beta                   vml
         4.05393                 2980.96              -266.526
         4.03284                 54.5982              -266.401
         3.90505                 20.0855              -265.648
         3.68781                 12.1825              -264.402
         3.30434                 7.38906              -262.666
         2.82272                 4.48169               -262.08
         2.25276                 2.71828              -262.445
         1.55578                 1.64872              -261.772
        0.955587                       1              -262.618

                 *** Optimal parameter estimates ***
         alpha                    beta                   vml
         1.54053                 1.63155              -261.771

                 *** Grid search for k =    22 ***
         alpha                    beta                   vml
         3.65641                 2980.96              -267.595
         3.64159                 54.5982              -267.499
         3.55056                 20.0855              -266.904
         3.38875                 12.1825              -265.859
         3.07147                 7.38906              -264.066
         2.64504                 4.48169              -263.039
          2.1374                 2.71828              -263.335
          1.5126                 1.64872               -262.64
        0.936368                       1              -262.683

                 *** Optimal parameter estimates ***
         alpha                    beta                   vml
         1.34218                 1.45002              -262.561

             *** The final solution     (k =    20) ***
         alpha                    beta                   vml
         1.54053                 1.63155              -261.771
 
                          Sorted Event Times
      1        2        3        4        5        6        7        8
   17.0     42.0     44.0     48.0     60.0     72.0     74.0     95.0
 
      9       10       11       12       13       14       15       16
  103.0    108.0    122.0    144.0    167.0    170.0    183.0    185.0
 
     17       18       19       20       21       22       23       24
  193.0    195.0    197.0    208.0    234.0    235.0    254.0    307.0
 
     25       26       27       28       29       30       31       32
  315.0    401.0    445.0    464.0    484.0    528.0    542.0    567.0
 
     33       34       35       36       37       38       39       40
  577.0    580.0    795.0    855.0    882.0    892.0   1031.0   1033.0
 
     41       42       43       44       45
 1306.0   1335.0   1366.0   1452.0   1472.0
 
                                Sorted Censors
 1   2   3   4   5   6   7   8   9  10  11  12  13  14  15  16  17  18  19  20
 0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0
 
21  22  23  24  25  26  27  28  29  30  31  32  33  34  35  36  37  38  39  40
 0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   1   1   1   1
 
41  42  43  44  45
 1   1   1   1   1
 
                                Hazard Rates
          1            2        
nmiss = 0

    3            4            5            6
   0.000962     0.001111     0.001276     0.001451     0.001634     0.001819
 
          7            8            9           10           11           12
   0.002004     0.002185     0.002359     0.002523     0.002675     0.002813
 
         13           14           15           16           17           18
   0.002935     0.003040     0.003126     0.003193     0.003240     0.003266
 
         19           20           21           22           23           24
   0.003273     0.003260     0.003229     0.003179     0.003114     0.003034
 
         25           26           27           28           29           30
   0.002941     0.002838     0.002727     0.002612     0.002495     0.002381
 
         31           32           33           34           35           36
   0.002273     0.002175     0.002084     0.001998     0.001917     0.001841
 
         37           38           39           40           41           42
   0.001771     0.001709     0.001655     0.001608     0.001569     0.001537
 
         43           44           45           46           47           48
   0.001510     0.001484     0.001459     0.001435     0.001411     0.001388
 
         49           50           51           52           53           54
   0.001365     0.001343     0.001323     0.001304     0.001285     0.001266
 
         55           56           57           58           59           60
   0.001247     0.001228     0.001208     0.001188     0.001167     0.001146
 
         61           62           63           64           65           66
   0.001125     0.001103     0.001081     0.001060     0.001040     0.001020
 
         67           68           69           70           71           72
   0.000999     0.000979     0.000958     0.000936     0.000913     0.000891
 
         73           74           75           76           77           78
   0.000868     0.000845     0.000821     0.000798     0.000775     0.000752
 
         79           80           81           82           83           84
   0.000730     0.000708     0.000685     0.000662     0.000640     0.000617
 
         85           86           87           88           89           90
   0.000595     0.000573     0.000552     0.000530     0.000510     0.000490
 
         91           92           93           94           95           96
   0.000471     0.000452     0.000434     0.000416     0.000399     0.000383
 
         97           98           99          100
   0.000366     0.000351     0.000336     0.000321

Fatal Errors

IMSLS_ALL_OBSERVATIONS_MISSING All observations are missing (NaN, not a number) values.