ESTIMATE_MISSING
Estimates missing values in a time series.
Required Arguments
ITIME_POINTS — Vector of length NOBSW containing the reference time points t1, …, tNOBSW at which the time series was observed. (Input)
The reference time points must be of data type integer and in strictly increasing order. Reference time points for missing values must lie in the open interval (t1, tNOBSW) It is assumed that the time series, after estimation of missing values, contains values at equidistant time points where the distance between two consecutive time points is one.
W — Vector of length NOBSW containing the time series values. (Input)
The values must be ordered in accordance with the values in vector ITIME_POINTS. It is assumed that the time series, after estimation of missing values, contains values at equidistant time points where the distance between two consecutive time points is one. If the non‑missing time series values are observed at time points t1, …, tNOBSW, then missing values between ti and ti+1, i = 1, …, NOBSW ‑ 1, exist if ti+1 ‑ ti > 1. The size of the gap between ti and ti+1 is then ti+1 ‑ ti ‑ 1. The total length of the time series with non‑missing and estimated missing values is tNOBSW ‑ ti + 1, or ITIME_POINTS(NOBSW) ‑ ITIME_POINTS(1) + 1.
For example, a time series with six observations with the fourth observation missing would appear as follows:
ITIME_POINTS |
W |
t1 = 1 |
.0019 |
t 2 = 2 |
.0018 |
t 3 = 3 |
.0019 |
|
|
t 4 = 5 |
.0021 |
t 5 = 6 |
.0022 |
t 6 = 7 |
.002 |
In this example, NOBSW = 6 and the length of the time series, Z, including missing values is tNOBSW ‑ ti + 1 = 7.
Z — Allocatable array which on return will contain the time series values together with estimates for the missing values. (Output)
Optional Arguments
NOBSW — Number of observations in time series W. (Input)
Default: NOBSW = size(W).
IMETH — Method used for missing value estimation. (Input)
Default: IMETH = 3.
IMETH |
Method |
0 |
Median. |
1 |
Cubic Spline Interpolation. |
2 |
AR(1) model. |
3 |
AR(p) model. |
If IMETH = 2 and the first gap begins at t2 or t3, the corresponding time series values are estimated using method Median. If IMETH = 3 is chosen and the first gap starts at t2, then the values of this gap are also estimated by method Median. If the length of the series before a gap, denoted len, is greater than 1 and less than 2⋅MAXLAG, then MAXLAG is reduced to len/2 for the estimate of the missing values within this gap.
MAXLAG — Maximum number of autoregressive parameters, p, in the AR(p) model when IMETH = 3 is chosen. (Input)
See AUTO_UNI_AR for further information.
Default: MAXLAG = 10.
MAXBC — Maximum length of backcasting in the least‑squares algorithm when IMETH = 2 is chosen. (Input)
MAXBC must be greater than or equal to zero. See NSLSE for further information.
Default: MAXBC = 0.
TOLBC — Tolerance level used to determine convergence of the backcast algorithm used when IMETH = 2 is chosen. (Input)
Backcasting terminates when the absolute value of a backcast is less than TOLBC. Typically, TOLBC is set to a fraction of wstdev where wstdev is an estimate of the standard deviation of the time series. See NSLSE for further information.
Default: TOLBC = 0.01 * wstdev.
TOLSS — Tolerance level used to determine convergence of the nonlinear least‑squares algorithm when IMETH = 2 is chosen. (Input)
TOLSS represents the minimum relative decrease in the sum of squares between two iterations required to determine convergence. Hence, TOLSS must be greater than zero and less than one. See NSLSE for further information.
The default value is: max{10−10, EPS2∕3} for single precision and
max{10−20, EPS2∕3} for double precision,
where EPS = AMACH(4) for single precision and EPS = DMACH(4) for double precision. See the documentation for routine AMACH in the Reference Material.
RELERR— Stopping criterion for use in the nonlinear equation solver when
IMETH = 2 is chosen. (Input)
See NSLSE for further information.
Default: RELERR= 100.0 * AMACH(4) for single precision,
RELERR= 100.0 * DMACH(4) for double precision.
See the documentation for routine AMACH in the Reference Material.
MAXIT — Maximum number of iterations allowed in the nonlinear equation solver when
IMETH = 2 is chosen. (Input)
See NSLSE for further information.
Default: MAXIT= 200.
WMEAN — Estimate of the mean of time series W. (Input)
Default: WMEAN = 0.0.
NOBSZ — Number of elements in the time series Z, with estimated missing values,
NOBSZ = ITIME_POINTS(NOBSW) – ITIME_POINTS(1) + 1. (Output)
ITIME_POINTS_FULL — Vector of length NOBSZ containing the reference time points of the time series Z. (Output)
MISS_INDEX — Vector of length NOBSZ‑NOBSW containing the indices for the missing values in vector ITIME_POINTS_FULL. (Output)
FORTRAN 90 Interface
Generic: CALL ESTIMATE_MISSING (ITIME_POINTS, W, Z [, …])
Specific: The specific interface names are S_ESTIMATE_MISSING and D_ESTIMATE_MISSING.
Description
Traditional time series analysis, as described by Box, Jenkins and Reinsel (1994), requires the observations be made at equidistant time points with no missing values. When observations are missing, the problem of determining suitable estimates occurs. Routine ESTIMATE_MISSING offers four methods for estimating missing values from an equidistant time series.
The Median method, IMETH = 0, estimates the missing observations in a gap by the median of the last four time series values before and the first four values after the gap. If enough values are not available before or after the gap then the number is reduced accordingly. This method is very fast and simple, but its use is limited to stationary ergodic series without outliers and level shifts.
The Cubic Spline Interpolation method, IMETH = 1, uses a cubic spline interpolation method to estimate missing values. Here the interpolation is again done over the last four time series values before and the first four values after the gap. The missing values are estimated by the resulting interpolant. This method gives smooth transitions across missing values.
The AR(1) method, IMETH = 2, assumes that the time series before the gap can be well described by an AR(1) process. If the last observation prior to the gap is made at time point tm then it uses the time series values at t1, t1 + 1, …, tm to compute the one‑step‑ahead forecast at origin tm. This value is taken as an estimate for the missing value at time point tm + 1. If the value at tm + 2 is also missing then the values at time points t1, t1 + 1, …, tm + 1 are used to recompute the AR(1) model, estimate the value at tm + 2 and so on. The coefficient φ1 in the AR(1) model is computed internally by the method of least squares from routine NSLSE.
The AR(p) method, IMETH = 3 ,uses an AR(p) model to estimate missing values by a one‑step‑ahead forecast First, routine AUTO_UNI_AR , applied to the time series prior to the missing values, is used to determine the optimum p from the set {0, 1, …, MAXLAG} of possible values and to compute the parameters φ1, …, φp of the resulting AR(p) model. The parameters are estimated by the method of moments. Denoting the mean of the series by μ the one‑step‑ahead forecast at origin , can be computed by the formula
This value is used as an estimate for the missing value. The procedure, starting with AUTO_UNI_AR, is then repeated for every further missing value in the gap.
All four estimation methods treat gaps of missing values in increasing time order.
Example
Consider the AR(1) process
Wt =ɸ1 Wt-1 + at, t =1, 2, 3, …
We assume that is a Gaussian white noise process, . Then, and (see Anderson, p. 174).
The time series in this example was artificially generated from an AR(1) process characterized by and . This process is stationary with . An initial value, was used. The sequence was generated by a random number generator.
From the original series, we remove the observations at time points t = 130, t = 140, t = 141, t = 160, t = 175, and t = 176. Then, ESTIMATE_MISSING is used to compute estimates for the missing values by all 4 estimation methods available. The estimated values are compared with the actual values.
use estimate_missing_int
!
implicit none
integer :: i, j, k, nout
integer, dimension(200) :: times_1, times_2
integer :: n_obs, n_miss
integer :: ntimes, miss_ind
integer, dimension(:), allocatable :: times, missing_index
real, dimension(200) :: x_1, x_2
real, dimension(:), allocatable :: result
real, dimension(200) :: y = (/ 1.30540,-1.37166,1.47905, &
-0.91059,1.36191,-2.16966,3.11254, -1.99536,2.29740, &
-1.82474,-0.25445,0.33519,-0.25480,-0.50574,-0.21429,&
-0.45932,-0.63813,0.25646,-0.46243,-0.44104,0.42733, &
0.61102,-0.82417,1.48537,-1.57733,-0.09846,0.46311, &
0.49156,-1.66090,2.02808,-1.45768,1.36115,-0.65973, &
1.13332,-0.86285,1.23848,-0.57301,-0.28210,0.20195, &
0.06981,0.28454,0.19745, -0.16490,-1.05019,0.78652, &
-0.40447,0.71514,-0.90003,1.83604, -2.51205,1.00526, &
-1.01683,1.70691,-1.86564,1.84912,-1.33120, 2.35105, &
-0.45579,-0.57773,-0.55226,0.88371,0.23138,0.59984, &
0.31971,0.59849,0.41873,-0.46955,0.53003,-1.17203, &
1.52937,-0.48017,-0.93830,1.00651,-1.41493,-0.42188,&
-0.67010,0.58079, -0.96193,0.22763,-0.92214,1.35697, &
-1.47008,2.47841,-1.50522, 0.41650,-0.21669,-0.90297,&
0.00274,-1.04863,0.66192,-0.39143, 0.40779,-0.68174,&
-0.04700,-0.84469,0.30735,-0.68412,0.25888, -1.08642,&
0.52928,0.72168,-0.18199,-0.09499,0.67610,0.14636, &
0.46846,-0.13989,0.50856,-0.22268,0.92756,0.73069, &
0.78998,-1.01650,1.25637,-2.36179,1.99616,-1.54326, &
1.38220,0.19674,-0.85241,0.40463,0.39523,-0.60721, &
0.25041,-1.24967,0.26727,1.40042,-0.66963,1.26049, &
-0.92074,0.05909,-0.61926,1.41550, 0.25537,-0.13240, &
-0.07543,0.10413,1.42445,-1.37379,0.44382, -1.57210, &
2.04702,-2.22450,1.27698,0.01073,-0.88459,0.88194, &
-0.25019,0.70224,-0.41855,0.93850,0.36007,-0.46043, &
0.18645,0.06337,0.29414,-0.20054,0.83078,-1.62530, &
2.64925,-1.25355,1.59094,-1.00684,1.03196,-1.58045, &
2.04295,-2.38264,1.65095,-0.33273,-1.29092,0.14020, &
-0.11434,0.04392,0.05293,-0.42277,0.59143,-0.03347, &
-0.58457,0.87030,0.19985,-0.73500,0.73640, 0.29531, &
0.22325,-0.60035,1.42253,-1.11278,1.30468,-0.41923, &
-0.38019,0.50937,0.23051,0.46496,0.02459,-0.68478, &
0.25821,1.17655,-2.26629,1.41173,-0.68331 /)
integer, dimension(200) :: tpoints=(/1,2,3,4,5,6,7,8,9,10,&
11,12,13,14,15,16,17,18,19,20,21,22,23,24,25,26,27,28,&
29,30,31,32,33,34,35,36,37,38,39,40,41,42,43,44,45,46,&
47,48,49,50,51,52,53,54,55,56,57,58,59,60,61,62,63,64,&
65,66,67,68,69,70,71,72,73,74,75,76,77,78,79,80,81,82,&
83,84,85,86,87,88,89,90,91,92,93,94,95,96,97,98,99,100,&
101,102,103,104,105,106,107,108,109,110,111,112,113,114,&
115,116,117,118,119,120,121,122,123,124,125,126,127,128,&
129,130,131,132,133,134,135,136,137,138,139,140,141,142,&
143,144,145,146,147,148,149,150,151,152,153,154,155,156,&
157,158,159,160,161,162,163,164,165,166,167,168,169,170,&
171,172,173,174,175,176,177,178,179,180,181,182,183,184,&
185,186,187,188,189,190,191,192,193,194,195,196,197,198,&
199,200 /)
n_miss = 0
times_1(1) = tpoints(1)
times_2(1) = tpoints(1)
x_1(1) = y(1)
x_2(1) = y(1)
k = 0
DO i=1,199
times_1(i+1) = tpoints(i+1)
x_1(i+1) = y(i+1)
! Generate series with missing values
IF ( i/=129 .AND. i/=139 .AND. i/=140 .AND. i/=159 &
.AND. i/=174 .AND. i/=175 ) THEN
k = k+1
times_2(k+1) = times_1(i+1)
x_2(k+1) = x_1(i+1)
END IF
END DO
!
n_obs = k + 1
ntimes = tpoints(200) - tpoints(1) + 1
n_miss = ntimes - n_obs
!
ALLOCATE(times(ntimes), missing_index(n_miss))
DO j=0,3
IF (j <= 2) THEN
CALL estimate_missing(times_2, x_2, result, NOBSW=n_obs, &
IMETH=j, ITIME_POINTS_FULL=times, &
MISS_INDEX=missing_index)
ELSE
CALL estimate_missing(times_2, x_2, result, NOBSW=n_obs, &
IMETH=j, ITIME_POINTS_FULL=times, &
MAXLAG=20, MISS_INDEX=missing_index)
END IF
CALL UMACH (2, nout)
IF (j == 0) WRITE (nout,*) "Method: Median"
IF (j == 1) WRITE (nout,*) "Method: Cubic Spline Interpolation"
IF (j == 2) WRITE (nout,*) "Method: AR(1) Forecast"
IF (j == 3) WRITE (nout,*) "Method: AR(p) Forecast"
WRITE(nout,99998)
DO i=0,n_miss-1
miss_ind = missing_index(i+1)
WRITE(nout,99999) times(miss_ind), x_1(miss_ind), &
result(miss_ind), &
ABS(x_1(miss_ind)-result(miss_ind))
END DO
WRITE(nout,*) ""
!
IF (ALLOCATED(result)) DEALLOCATE(result)
END DO
!
IF (ALLOCATED(times)) DEALLOCATE(times)
IF (ALLOCATED(missing_index)) DEALLOCATE(missing_index)
!
99998 FORMAT("time",6x,"actual",6x,"predicted",6x,"difference")
99999 FORMAT(I4,6x,F6.3,8x,F6.3,7x,F6.3)
END
Output
Method: Median
time actual predicted difference
130 -0.921 0.261 1.182
140 0.444 0.057 0.386
141 -1.572 0.057 1.630
160 2.649 0.047 2.602
175 -0.423 0.048 0.471
176 0.591 0.048 0.543
Method: Cubic Spline Interpolation
time actual predicted difference
130 -0.921 1.541 2.462
140 0.444 -0.407 0.851
141 -1.572 2.497 4.069
160 2.649 -2.947 5.596
175 -0.423 0.251 0.673
176 0.591 0.380 0.211
Method: AR(1) Forecast
time actual predicted difference
130 -0.921 -0.916 0.005
140 0.444 1.019 0.575
141 -1.572 -0.714 0.858
160 2.649 1.228 1.421
175 -0.423 -0.010 0.413
176 0.591 0.037 0.555
Method: AR(p) Forecast
time actual predicted difference
130 -0.921 -0.901 0.020
140 0.444 1.024 0.580
141 -1.572 -0.706 0.867
160 2.649 1.233 1.417
175 -0.423 -0.002 0.421
176 0.591 0.039 0.553