HAZRD
Performs nonparametric hazard rate estimation using kernel functions and quasi‑likelihoods.
Required Arguments
X — NOBS by m matrix containing the raw data, where m = 1 if ICEN = 0, and m = 2 otherwise. (Input)
IRT — Column number in X of the event times. (Input)
KMIN — Minimum number for parameter k. (Input)
Parameter k is the number of nearest neighbors to be used in computing the k‑th nearest neighbor distance.
INK — Increment between successive values of parameter k. (Input)
NK — Number of values of k to be considered. (Input)
HAZRD finds the optimal value of k over the grid given by: KMIN + (j ‑ 1) * INK, for j = 1, …, NK.
ST — Vector of length
NOBS containing the times of occurrence of the events, sorted from smallest to largest. (Output)
Vector
ST is obtained from the matrix
X and should be used as input to routine
HAZST.
JCEN — Vector of length
NOBS containing the sorted censor codes. (Output)
Censor codes are sorted corresponding to the events
ST(
i), with censored observations preceding tied failures. Vector
JCEN is obtained from the censor codes in
X, if present, and is used as input to routine
HAZST.
ALPHA — Optimal estimate for the parameter α. (Output)
BTA — Optimal estimate for the parameter β. (Output)
K — Optimal estimate for the parameter k. (Output)
VML — Optimum value of the criterion function. (Output)
H — Vector of length
NOBS * 5 containing constants needed to compute the
k‑th nearest failure distances, and the observation weights. (Output)
H is used as input to routine
HAZST.
Optional Arguments
NOBS — Number of observations. (Input)
Default: NOBS = size (X,1).
LDX — Leading dimension of X exactly as specified in the dimension statement in the calling program. (Input)
Default: LDX = size (X,1).
ICEN — Censoring option. (Input)
If ICEN = 0, then all of the data is treated as exact data with no censoring. For ICEN > 0, column ICEN of X contains the censoring codes. A censoring code of 0 means an exact event (failure). A censoring code of 1 means that the observation was right censored at the event time.
Default: ICEN = 0.
IWTO — Weight option. (Input)
If IWTO = 1, then weight ln(1 + 1/(NOBS ‑ i + 1)) is used for the i‑th smallest observation. Otherwise, weight 1/(NOBS ‑ i + 1) is used.
Default: IWTO = 0.
NGRID — Grid option. (Input)
If NGRID = 0, a default grid is used to locate an initial starting value for parameter BTA. For NGRID > 0, a user‑defined grid is used. This grid is defined as BSTART + (j ‑ 1) * GINC, for j = 1, …, NGRID, where BSTART, GINC, and NGRID are input.
Default: NGRID = 0.
BSTART — First value to be used in the user‑defined grid. (Input)
Not used if NGRID = 0.
GINC — For a user‑defined grid, the increment between successive grid values of BTA. (Input)
Not used if NGRID = 0.
IPRINT — Printing option. (Input)
If IPRINT = 1, the grid estimates and the optimized estimates are printed for each value of k. Otherwise, no printing is performed.
Default: IPRINT = 0.
ISORT — Sorting option. (Input)
If ISORT = 1, then the event times are not automatically sorted by HAZRD. Otherwise, sorting is performed with exact failure times following tied right‑censored times.
Default: ISORT = 0.
NMISS — Number of missing (NaN, not a number) values in X. (Output)
FORTRAN 90 Interface
Generic: CALL HAZRD (X, IRT, KMIN, INK, NK, ST, JCEN, ALPHA, BTA, K, VML, H [, …])
Specific: The specific interface names are S_HAZRD and D_HAZRD.
FORTRAN 77 Interface
Single: CALL HAZRD (NOBS, X, LDX, IRT, ICEN, IWTO, NGRID, BSTART, GINC, KMIN, INK, NK, IPRINT, ISORT, ST, JCEN, ALPHA, BTA, K, VML, H, NMISS)
Double: The double precision name is DHAZRD.
Description
Routine HAZRD is an implementation of the methods discussed by Tanner and Wong (1984) for estimating the hazard rate in survival or reliability data with right censoring. It uses the biweight kernel,
and a modified likelihood to obtain data‑based estimates of the smoothing parameters α, β, and k needed in the estimation of the hazard rate. For kernel K(x), define the “smoothed” kernel Ks(x ‑ x(j)) as follows:
where djk is the distance to the k‑th nearest failure from x(j), and x(j) is the j‑th ordered observation (from smallest to largest). For given α and β, the hazard at point x is then
where
N =
NOBS,
δi is the
i‑th observation’s censor code (1 = censored, 0 = failed), and
wi is the
i‑th ordered observation’s weight, which may be chosen as either 1/(
N ‑ i + 1), or ln(1 + 1/(
N ‑ i + 1)). After the smoothing parameters have been obtained, the hazard may be estimated via
HAZST.
Let
The likelihood is given by
where Π denotes product. Since the likelihood leads to degenerate estimates, Tanner and Wong (1984) suggest the use of a modified likelihood. The modification consists of deleting observation xi in the calculation of h(xi) and H(xi) when the likelihood term for xi is computed using the usual optimization techniques. α and β for given k can then be estimated.
Estimates for α and β are computed as follows: for given β, a closed form solution is available for α. The problem is thus reduced to the estimation of β. A grid search for β is first performed. Experience indicates that if the initial estimate of β from this grid search is greater than, say, e6, then the modified likelihood is degenerate because the hazard rate does not change with time. In this situation, β should be taken to be infinite, and an estimate of α corresponding to infinite β should be directly computed. When the estimate of β from the grid search is less than e6, a secant algorithm is used to optimize the modified likelihood. The secant algorithm iteration stops when the change in β from one iteration to the next is less than 10−5. Alternatively, the iterations may cease when the value of β becomes greater than e6, at which point an infinite β with a degenerate likelihood is assumed.
To find the optimum value of the likelihood with respect to k, a user‑specified grid of k‑values is used. For each grid value, the modified likelihood is optimized with respect to α and β. That grid point, which leads to the smallest likelihood, is taken to be the optimal k.
Comments
1. Informational Errors
Type | Code | Description |
---|
4 | 18 | All observations are missing (NaN, not a number) values. |
2. In the optimization routines, the parameterization is changed to β* and α*, where β* = ‑ ln(β) and α* = ‑ln(α). The default grid uses –8, –4, –3, –2.5, –2, –1.5, –1, –0.5, and 0.5 for β*. This corresponds to a grid in β of 2981, 54.6, 20.08, 12.18, 7.39, 4.48, 2.72, 1.64, and .61. The grid β that maximizes the modified “likelihood” is used as the starting point for the iterations.
3. If the initial estimate of β as determined from the grid or as given by the user is greater than 400 (actually e6), then infinite β is assumed, and an analytic estimate of α based upon infinite β is used. In the optimization, if it is determined that β must be greater than 1000, then an infinite β is assumed. Infinite β corresponds to a “flat” hazard rate.
Programming Notes
1. The routine
HAZST may be used to estimate the hazard on a grid of points once the optimal values for
α,
β and
k have been found. The user should also consider using the “easy‑to‑use” version of
HAZRD, routine
HAZEZ.
2. If sorting of the data is performed by HAZRD, then the sorted array will be such that all censored observations at a given time precede all failures at that time. To specify an arbitrary pattern of censored/failed observations at a given time point, the ISORT = 1 option must be used. In this case, it is assumed that the times have already been sorted from smallest to largest.
3. The smallest value of k must be greater than the largest number of tied failures since djk must be positive for all j. (Censored observations are not counted.) Similarly, the largest value of k must be less than the total number of failures. If the grid specified for k includes values outside the allowable range, then a warning error is issued; but k is still optimized over the allowable grid values.
4. The secant algorithm iterates on the transformed parameter β* = exp(‑ β). This assures a positive β, and it also seems to lead to a more desirable grid search. All results returned to the user are in the original parameterization, however.
5. Since local minimums have been observed in the modified likelihood, it is recommended that more than one grid of initial values for α and β be used.
Example
The following example is taken from Tanner and Wong (1984). The data are from Stablein, Carter, and Novak (1981) and involve the survival times of individuals with nonresectable gastric carcinoma. Individuals treated with radiation and chemotherapy are used. For each value of k from 18 to 22 with increment of 2, the default grid search for β is performed. Using the optimal value of β in the grid, the optimal parameter estimates of α and β are computed for each value of k. The final solution is the parameter estimates for the value of k which optimizes the modified likelihood (VML). Because the IPRINT = 1 option is in effect, HAZRD prints all of the results in the output.
USE HAZRD_INT
USE UMACH_INT
USE WRRRN_INT
USE WRIRN_INT
IMPLICIT NONE
INTEGER ICEN, INK, IPRINT, IRT, ISORT, KMIN, LDX, &
NK, NOBS
PARAMETER (ICEN=2, INK=2, IPRINT=1, IRT=1, ISORT=1, &
KMIN=18, LDX=45, NK=3, NOBS=45)
!
INTEGER JCEN(NOBS), K, NMISS, NOUT
REAL ALPHA, BTA, H(5*NOBS), ST(NOBS), VML, X(NOBS,2)
!
DATA X/17, 42, 44, 48, 60, 72, 74, 95, 103, 108, 122, 144, 167, &
170, 183, 185, 193, 195, 197, 208, 234, 235, 254, 307, 315, &
401, 445, 464, 484, 528, 542, 567, 577, 580, 795, 855, 882, &
892, 1031, 1033, 1306, 1335, 1366, 1452, 1472, 36*0, 9*1/
!
CALL HAZRD (X, IRT, KMIN, INK, NK, ST, JCEN, ALPHA, BTA, &
K, VML, H, ICEN=ICEN, IPRINT=IPRINT, ISORT=ISORT, &
NMISS=NMISS)
!
CALL UMACH (2, NOUT)
WRITE (NOUT,99999) NMISS
99999 FORMAT (/' NMISS = ', I4/)
CALL WRRRN ('ST', ST, 1, NOBS, 1)
CALL WRIRN ('JCEN', JCEN, 1, NOBS, 1)
CALL WRRRN ('H', H, NOBS, 5, NOBS)
END
Output
*** GRID SEARCH FOR K = 18 ***
ALPHA BETA VML
4.578322 2980.958008 -266.804504
4.543117 54.598148 -266.619690
4.336464 20.085537 -265.541168
4.019334 12.182494 -264.001404
3.542742 7.389056 -262.540100
2.990577 4.481689 -262.511810
2.351537 2.718282 -262.633911
1.584173 1.648721 -262.158264
0.966332 1.000000 -262.868408
*** OPTIMAL PARAMETER ESTIMATES ***
ALPHA BETA VML
1.695147 1.769263 -262.118530
*** GRID SEARCH FOR K = 20 ***
ALPHA BETA VML
4.053934 2980.958008 -266.525970
4.032835 54.598148 -266.401428
3.905046 20.085537 -265.648315
3.687815 12.182494 -264.401672
3.304344 7.389056 -262.665924
2.822716 4.481689 -262.080078
2.252759 2.718282 -262.445251
1.555777 1.648721 -261.772278
0.955586 1.000000 -262.617645
*** OPTIMAL PARAMETER ESTIMATES ***
ALPHA BETA VML
1.540533 1.631551 -261.771484
*** GRID SEARCH FOR K = 22 ***
ALPHA BETA VML
3.656405 2980.958008 -267.595337
3.641593 54.598148 -267.498596
3.550560 20.085537 -266.903870
3.388752 12.182494 -265.859131
3.071474 7.389056 -264.066040
2.645036 4.481689 -263.038696
2.137399 2.718282 -263.334717
1.512606 1.648721 -262.639740
0.936368 1.000000 -262.682739
*** OPTIMAL PARAMETER ESTIMATES ***
ALPHA BETA VML
1.342176 1.450016 -262.561188
*** THE FINAL SOLUTION (K = 20) ***
ALPHA BETA VML
1.540533 1.631551 -261.771484
NMISS = 0
ST
1 2 3 4 5 6 7 8
17.0 42.0 44.0 48.0 60.0 72.0 74.0 95.0
9 10 11 12 13 14 15 16
103.0 108.0 122.0 144.0 167.0 170.0 183.0 185.0
17 18 19 20 21 22 23 24
193.0 195.0 197.0 208.0 234.0 235.0 254.0 307.0
25 26 27 28 29 30 31 32
315.0 401.0 445.0 464.0 484.0 528.0 542.0 567.0
33 34 35 36 37 38 39 40
577.0 580.0 795.0 855.0 882.0 892.0 1031.0 1033.0
41 42 43 44 45
1306.0 1335.0 1366.0 1452.0 1472.0
JCEN
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20
0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40
0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 1 1 1
41 42 43 44 45
1 1 1 1 1
H
1 2 3 4 5
1 217.0 218.0 1.0 21.0 1.0
2 192.0 193.0 1.0 21.0 0.5
3 190.0 191.0 1.0 21.0 0.3
4 186.0 187.0 1.0 21.0 0.2
5 174.0 175.0 1.0 21.0 0.2
6 162.0 163.0 1.0 21.0 0.2
7 160.0 161.0 1.0 21.0 0.1
8 139.0 140.0 1.0 21.0 0.1
9 131.0 132.0 1.0 21.0 0.1
10 126.0 127.0 1.0 21.0 0.1
11 112.0 113.0 1.0 21.0 0.1
12 102.0 110.0 2.0 22.0 0.1
13 123.0 125.0 3.0 23.0 0.1
14 126.0 128.0 3.0 23.0 0.1
15 132.0 135.0 5.0 25.0 0.1
16 130.0 137.0 5.0 25.0 0.1
17 133.0 145.0 5.0 25.0 0.1
18 135.0 147.0 5.0 25.0 0.1
19 137.0 149.0 5.0 25.0 0.1
20 148.0 160.0 5.0 25.0 0.1
21 167.0 174.0 6.0 26.0 0.0
22 166.0 175.0 6.0 26.0 0.0
23 182.0 191.0 6.0 26.0 0.0
24 204.0 212.0 9.0 29.0 0.0
25 212.0 213.0 9.0 29.0 0.0
26 231.0 234.0 14.0 34.0 0.0
27 275.0 278.0 14.0 34.0 0.0
28 294.0 297.0 14.0 34.0 0.0
29 311.0 314.0 15.0 35.0 0.0
30 343.0 345.0 16.0 36.0 0.0
31 357.0 359.0 16.0 38.0 0.0
32 382.0 384.0 16.0 38.0 0.0
33 392.0 394.0 16.0 38.0 0.0
34 395.0 397.0 16.0 38.0 0.0
35 610.0 612.0 16.0 43.0 0.0
36 670.0 672.0 16.0 45.0 0.0
37 689.0 697.0 17.0 45.0 0.0
38 699.0 707.0 17.0 45.0 0.0
39 838.0 846.0 17.0 45.0 0.0
40 840.0 848.0 17.0 45.0 0.0
41 1113.0 1121.0 17.0 45.0 0.0
42 1142.0 1150.0 17.0 45.0 0.0
43 1173.0 1181.0 17.0 45.0 0.0
44 1259.0 1267.0 17.0 45.0 0.0
45 1279.0 1287.0 17.0 45.0 0.0