ESTIMATE_MISSING

FNLStat : Time Series Analysis and Forecasting : ESTIMATE_MISSING

ESTIMATE_MISSING

Required Arguments

more...

Estimates missing values in a time series.

Required Arguments

ITIME_POINTS — Vector of length NOBSW containing the reference time points t1, …, tNOBSW at which the time series was observed. (Input)
The reference time points must be of data type integer and in strictly increasing order. Reference time points for missing values must lie in the open interval (t1, tNOBSW) It is assumed that the time series, after estimation of missing values, contains values at equidistant time points where the distance between two consecutive time points is one.

W — Vector of length NOBSW containing the time series values. (Input)
The values must be ordered in accordance with the values in vector ITIME_POINTS. It is assumed that the time series, after estimation of missing values, contains values at equidistant time points where the distance between two consecutive time points is one. If the non‑missing time series values are observed at time points t1, …, tNOBSW, then missing values between ti and ti+1, i = 1, …, NOBSW ‑ 1, exist if ti+1 ‑ ti > 1. The size of the gap between ti and ti+1 is then ti+1 ‑ ti ‑ 1. The total length of the time series with non‑missing and estimated missing values is tNOBSW ‑ ti + 1, or ITIME_POINTS(NOBSW) ‑ ITIME_POINTS(1) + 1.

For example, a time series with six observations with the fourth observation missing would appear as follows:

ITIME_POINTS	W
t1 = 1	.0019
t 2 = 2	.0018
t 3 = 3	.0019

t 4 = 5	.0021
t 5 = 6	.0022
t 6 = 7	.002

In this example, NOBSW = 6 and the length of the time series, Z, including missing values is tNOBSW ‑ ti + 1 = 7.

Z — Allocatable array which on return will contain the time series values together with estimates for the missing values. (Output)

Optional Arguments

NOBSW — Number of observations in time series W. (Input)
Default: NOBSW = size(W).

IMETH — Method used for missing value estimation. (Input)
Default: IMETH = 3.

IMETH	Method
0	Median.
1	Cubic Spline Interpolation.
2	AR(1) model.
3	AR(p) model.

If IMETH = 2 and the first gap begins at t2 or t3, the corresponding time series values are estimated using method Median. If IMETH = 3 is chosen and the first gap starts at t2, then the values of this gap are also estimated by method Median. If the length of the series before a gap, denoted len, is greater than 1 and less than 2⋅MAXLAG, then MAXLAG is reduced to len/2 for the estimate of the missing values within this gap.

MAXLAG — Maximum number of autoregressive parameters, p, in the AR(p) model when IMETH = 3 is chosen. (Input)
See AUTO_UNI_AR for further information.
Default: MAXLAG = 10.

MAXBC — Maximum length of backcasting in the least‑squares algorithm when IMETH = 2 is chosen. (Input)
MAXBC must be greater than or equal to zero. See NSLSE for further information.
Default: MAXBC = 0.

TOLBC — Tolerance level used to determine convergence of the backcast algorithm used when IMETH = 2 is chosen. (Input)
Backcasting terminates when the absolute value of a backcast is less than TOLBC. Typically, TOLBC is set to a fraction of wstdev where wstdev is an estimate of the standard deviation of the time series. See NSLSE for further information.
Default: TOLBC = 0.01 * wstdev.

TOLSS — Tolerance level used to determine convergence of the nonlinear least‑squares algorithm when IMETH = 2 is chosen. (Input)
TOLSS represents the minimum relative decrease in the sum of squares between two iterations required to determine convergence. Hence, TOLSS must be greater than zero and less than one. See NSLSE for further information.
The default value is: max{10−10, EPS2∕3} for single precision and
max{10−20, EPS2∕3} for double precision,

where EPS = AMACH(4) for single precision and EPS = DMACH(4) for double precision. See the documentation for routine AMACH in the Reference Material.

RELERR— Stopping criterion for use in the nonlinear equation solver when
IMETH = 2 is chosen. (Input)
See NSLSE for further information.
Default: RELERR= 100.0 * AMACH(4) for single precision,
RELERR= 100.0 * DMACH(4) for double precision.
See the documentation for routine AMACH in the Reference Material.

MAXIT — Maximum number of iterations allowed in the nonlinear equation solver when
IMETH = 2 is chosen. (Input)
See NSLSE for further information.
Default: MAXIT= 200.

WMEAN — Estimate of the mean of time series W. (Input)
Default: WMEAN = 0.0.

NOBSZ — Number of elements in the time series Z, with estimated missing values,
NOBSZ = ITIME_POINTS(NOBSW) – ITIME_POINTS(1) + 1. (Output)

ITIME_POINTS_FULL — Vector of length NOBSZ containing the reference time points of the time series Z. (Output)

MISS_INDEX — Vector of length NOBSZ‑NOBSW containing the indices for the missing values in vector ITIME_POINTS_FULL. (Output)

FORTRAN 90 Interface

Generic: CALL ESTIMATE_MISSING (ITIME_POINTS, W, Z [, …])

Specific: The specific interface names are S_ESTIMATE_MISSING and D_ESTIMATE_MISSING.

Description

Traditional time series analysis, as described by Box, Jenkins and Reinsel (1994), requires the observations be made at equidistant time points with no missing values. When observations are missing, the problem of determining suitable estimates occurs. Routine ESTIMATE_MISSING offers four methods for estimating missing values from an equidistant time series.

The Median method, IMETH = 0, estimates the missing observations in a gap by the median of the last four time series values before and the first four values after the gap. If enough values are not available before or after the gap then the number is reduced accordingly. This method is very fast and simple, but its use is limited to stationary ergodic series without outliers and level shifts.

The Cubic Spline Interpolation method, IMETH = 1, uses a cubic spline interpolation method to estimate missing values. Here the interpolation is again done over the last four time series values before and the first four values after the gap. The missing values are estimated by the resulting interpolant. This method gives smooth transitions across missing values.

The AR(1) method, IMETH = 2, assumes that the time series before the gap can be well described by an AR(1) process. If the last observation prior to the gap is made at time point tm then it uses the time series values at t1, t1 + 1, …, tm to compute the one‑step‑ahead forecast at origin tm. This value is taken as an estimate for the missing value at time point tm + 1. If the value at tm + 2 is also missing then the values at time points t1, t1 + 1, …, tm + 1 are used to recompute the AR(1) model, estimate the value at tm + 2 and so on. The coefficient φ1 in the AR(1) model is computed internally by the method of least squares from routine NSLSE.

The AR(p) method, IMETH = 3 ,uses an AR(p) model to estimate missing values by a one‑step‑ahead forecast First, routine AUTO_UNI_AR , applied to the time series prior to the missing values, is used to determine the optimum p from the set {0, 1, …, MAXLAG} of possible values and to compute the parameters φ1, …, φp of the resulting AR(p) model. The parameters are estimated by the method of moments. Denoting the mean of the series

by μ the one‑step‑ahead forecast at origin

, can be computed by the formula

This value is used as an estimate for the missing value. The procedure, starting with AUTO_UNI_AR, is then repeated for every further missing value in the gap.

All four estimation methods treat gaps of missing values in increasing time order.

Example

Consider the AR(1) process

Wt =ɸ1 Wt-1 + at, t =1, 2, 3, …

We assume that

is a Gaussian white noise process,

. Then,

and

(see Anderson, p. 174).

The time series in this example was artificially generated from an AR(1) process characterized by

and

. This process is stationary with

. An initial value,

was used. The sequence

was generated by a random number generator.

From the original series, we remove the observations at time points t = 130, t = 140, t = 141, t = 160, t = 175, and t = 176. Then, ESTIMATE_MISSING is used to compute estimates for the missing values by all 4 estimation methods available. The estimated values are compared with the actual values.

use estimate_missing_int

implicit none

integer :: i, j, k, nout

integer, dimension(200) :: times_1, times_2

integer :: n_obs, n_miss

integer :: ntimes, miss_ind

integer, dimension(:), allocatable :: times, missing_index

real, dimension(200) :: x_1, x_2

real, dimension(:), allocatable :: result

real, dimension(200) :: y = (/ 1.30540,-1.37166,1.47905, &

-0.91059,1.36191,-2.16966,3.11254, -1.99536,2.29740, &

-1.82474,-0.25445,0.33519,-0.25480,-0.50574,-0.21429,&

-0.45932,-0.63813,0.25646,-0.46243,-0.44104,0.42733, &

0.61102,-0.82417,1.48537,-1.57733,-0.09846,0.46311, &

0.49156,-1.66090,2.02808,-1.45768,1.36115,-0.65973, &

1.13332,-0.86285,1.23848,-0.57301,-0.28210,0.20195, &

0.06981,0.28454,0.19745, -0.16490,-1.05019,0.78652, &

-0.40447,0.71514,-0.90003,1.83604, -2.51205,1.00526, &

-1.01683,1.70691,-1.86564,1.84912,-1.33120, 2.35105, &

-0.45579,-0.57773,-0.55226,0.88371,0.23138,0.59984, &

0.31971,0.59849,0.41873,-0.46955,0.53003,-1.17203, &

1.52937,-0.48017,-0.93830,1.00651,-1.41493,-0.42188,&

-0.67010,0.58079, -0.96193,0.22763,-0.92214,1.35697, &

-1.47008,2.47841,-1.50522, 0.41650,-0.21669,-0.90297,&

0.00274,-1.04863,0.66192,-0.39143, 0.40779,-0.68174,&

-0.04700,-0.84469,0.30735,-0.68412,0.25888, -1.08642,&

0.52928,0.72168,-0.18199,-0.09499,0.67610,0.14636, &

0.46846,-0.13989,0.50856,-0.22268,0.92756,0.73069, &

0.78998,-1.01650,1.25637,-2.36179,1.99616,-1.54326, &

1.38220,0.19674,-0.85241,0.40463,0.39523,-0.60721, &

0.25041,-1.24967,0.26727,1.40042,-0.66963,1.26049, &

-0.92074,0.05909,-0.61926,1.41550, 0.25537,-0.13240, &

-0.07543,0.10413,1.42445,-1.37379,0.44382, -1.57210, &

2.04702,-2.22450,1.27698,0.01073,-0.88459,0.88194, &

-0.25019,0.70224,-0.41855,0.93850,0.36007,-0.46043, &

0.18645,0.06337,0.29414,-0.20054,0.83078,-1.62530, &

2.64925,-1.25355,1.59094,-1.00684,1.03196,-1.58045, &

2.04295,-2.38264,1.65095,-0.33273,-1.29092,0.14020, &

-0.11434,0.04392,0.05293,-0.42277,0.59143,-0.03347, &

-0.58457,0.87030,0.19985,-0.73500,0.73640, 0.29531, &

0.22325,-0.60035,1.42253,-1.11278,1.30468,-0.41923, &

-0.38019,0.50937,0.23051,0.46496,0.02459,-0.68478, &

0.25821,1.17655,-2.26629,1.41173,-0.68331 /)

integer, dimension(200) :: tpoints=(/1,2,3,4,5,6,7,8,9,10,&

11,12,13,14,15,16,17,18,19,20,21,22,23,24,25,26,27,28,&

29,30,31,32,33,34,35,36,37,38,39,40,41,42,43,44,45,46,&

47,48,49,50,51,52,53,54,55,56,57,58,59,60,61,62,63,64,&

65,66,67,68,69,70,71,72,73,74,75,76,77,78,79,80,81,82,&

83,84,85,86,87,88,89,90,91,92,93,94,95,96,97,98,99,100,&

101,102,103,104,105,106,107,108,109,110,111,112,113,114,&

115,116,117,118,119,120,121,122,123,124,125,126,127,128,&

129,130,131,132,133,134,135,136,137,138,139,140,141,142,&

143,144,145,146,147,148,149,150,151,152,153,154,155,156,&

157,158,159,160,161,162,163,164,165,166,167,168,169,170,&

171,172,173,174,175,176,177,178,179,180,181,182,183,184,&

185,186,187,188,189,190,191,192,193,194,195,196,197,198,&

199,200 /)

n_miss = 0

times_1(1) = tpoints(1)

times_2(1) = tpoints(1)

x_1(1) = y(1)

x_2(1) = y(1)

k = 0

DO i=1,199

times_1(i+1) = tpoints(i+1)

x_1(i+1) = y(i+1)

! Generate series with missing values

IF ( i/=129 .AND. i/=139 .AND. i/=140 .AND. i/=159 &

.AND. i/=174 .AND. i/=175 ) THEN

k = k+1

times_2(k+1) = times_1(i+1)

x_2(k+1) = x_1(i+1)

END IF

END DO

n_obs = k + 1

ntimes = tpoints(200) - tpoints(1) + 1

n_miss = ntimes - n_obs

ALLOCATE(times(ntimes), missing_index(n_miss))

DO j=0,3

IF (j <= 2) THEN

CALL estimate_missing(times_2, x_2, result, NOBSW=n_obs, &

IMETH=j, ITIME_POINTS_FULL=times, &

MISS_INDEX=missing_index)

ELSE

CALL estimate_missing(times_2, x_2, result, NOBSW=n_obs, &

IMETH=j, ITIME_POINTS_FULL=times, &

MAXLAG=20, MISS_INDEX=missing_index)

END IF

CALL UMACH (2, nout)

IF (j == 0) WRITE (nout,*) "Method: Median"

IF (j == 1) WRITE (nout,*) "Method: Cubic Spline Interpolation"

IF (j == 2) WRITE (nout,*) "Method: AR(1) Forecast"

IF (j == 3) WRITE (nout,*) "Method: AR(p) Forecast"

WRITE(nout,99998)

DO i=0,n_miss-1

miss_ind = missing_index(i+1)

WRITE(nout,99999) times(miss_ind), x_1(miss_ind), &

result(miss_ind), &

ABS(x_1(miss_ind)-result(miss_ind))

END DO

WRITE(nout,*) ""

IF (ALLOCATED(result)) DEALLOCATE(result)

END DO

IF (ALLOCATED(times)) DEALLOCATE(times)

IF (ALLOCATED(missing_index)) DEALLOCATE(missing_index)

99998 FORMAT("time",6x,"actual",6x,"predicted",6x,"difference")

99999 FORMAT(I4,6x,F6.3,8x,F6.3,7x,F6.3)

END

Output

Method: Median

time actual predicted difference

130 -0.921 0.261 1.182

140 0.444 0.057 0.386

141 -1.572 0.057 1.630

160 2.649 0.047 2.602

175 -0.423 0.048 0.471

176 0.591 0.048 0.543

Method: Cubic Spline Interpolation

time actual predicted difference

130 -0.921 1.541 2.462

140 0.444 -0.407 0.851

141 -1.572 2.497 4.069

160 2.649 -2.947 5.596

175 -0.423 0.251 0.673

176 0.591 0.380 0.211

Method: AR(1) Forecast

time actual predicted difference

130 -0.921 -0.916 0.005

140 0.444 1.019 0.575

141 -1.572 -0.714 0.858

160 2.649 1.228 1.421

175 -0.423 -0.010 0.413

176 0.591 0.037 0.555

Method: AR(p) Forecast

time actual predicted difference

130 -0.921 -0.901 0.020

140 0.444 1.024 0.580

141 -1.572 -0.706 0.867

160 2.649 1.233 1.417

175 -0.423 -0.002 0.421

176 0.591 0.039 0.553