smooth_1d_data

Smooth one-dimensional data by error detection.

Synopsis

#include <imsl.h>

float *imsl_f_smooth_1d_data (int ndata, float xdata[], float fdata[], , 0)

The type double function is imsl_d_smooth_1d_data.

Required Arguments

int ndata (Input)
Number of data points.

float xdata[] (Input)
Array with ndata components containing the abscissas of the data points.

float ydata[] (Input)
Array with ndata components containing the ordinates of the data points.

Return Value

A pointer to the vector of length ndata containing the smoothed data.

Synopsis with Optional Arguments

#include <imsl.h>

float *imsl_f_smooth_1d_data (int ndata, float xdata[], float fdata[],

IMSL_RETURN_USER, float sdata[],

IMSL_ITMAX, int itmax,

IMSL_DISTANCE, float dis,

IMSL_STOPPING_CRITERION, float sc,

0)

Optional Arguments

IMSL_RETURN_USER, float sdata[] (Output)
The smoothed data is stored in the user-supplied array.

IMSL_ITMAX, int itmax (Input)
The maximum number of iterations allowed.
Default: itmax = 500

IMSL_DISTANCE, float dis (Input)
Proportion of the distance the ordinate in error is moved to its
interpolating curve. It must be in the range 0.0 to 1.0.
Default: dis = 1.0

IMSL_STOPPING_CRITERION, float sc (Input)
The stopping criterion. sc should be greater than or equal to zero.
Default: sc = 0.0

Description

The function imsl_f_smooth_1d_data is designed to smooth a data set that is mildly contaminated with isolated errors. In general, the routine will not work well if more than 25% of the data points are in error. The routine imsl_f_smooth_1d_data is based on an algorithm of Guerra and Tapia (1974).

Setting ndata = n, ydata = f, sdata = s and xdata = x, the algorithm proceeds as follows. Although the user need not input an ordered xdata sequence, we will assume that x is increasing for simplicity. The algorithm first sorts the xdata values into an increasing sequence and then continues. A cubic spline interpolant is computed for each of the 6-point data sets (initially setting s = f)

(xj, sj) j = i - 3, , i + 3j i,

where i = 4, , n - 3. For each i the interpolant, which we will call Si, is compared with the current value of si, and a ‘point energy’ is computed as

pei = Si (xi ) - si

Setting sc = sc, the algorithm terminates either if itmax iterations have taken place or if

 

If the above inequality is violated for any i, then we update the i-th element of s by setting si = si + d(pei), where d = dis. Note that neither the first three nor the last three data points are changed. Thus, if these points are inaccurate, care must be taken to interpret the results.

The choice of the parameters d, sc and itmax are crucial to the successful usage of this subroutine. If the user has specific information about the extent of the contamination, then he should choose the parameters as follows: d = 1, sc = 0 and itmax to be the number of data points in error. On the other hand, if no such specific information is available, then choose d =  5, itmax  2n, and

 

In any case, we would encourage the user to experiment with these values.

Example

We take 91 uniform samples from the function 5 + (5 + t2 sin t)/t on the interval [1, 10]. Then, we contaminate 10 of the samples and try to recover the original function values.

 

#include <imsl.h>

#include <stdlib.h>

#include <math.h>

 

#define NDATA 91

#define F(X) (X*X*sin((double)(X))+5.0)/X + 5.0

 

int main()

{

int i, maxit;

int isub[10] = {5, 16, 25, 33, 41, 48, 55, 61, 74, 82};

float dis, fdata[NDATA], sc, *sdata=NULL;

float xdata[NDATA], s_user[NDATA];

float rnoise[10] = {2.5, -3., -2., 2.5, 3.,

-2., -2.5, 2., -2., 3.};

 

/* Example 1: No specific information available. */

dis = .5;

sc = .56;

maxit = 182;

 

/* Set values for xdata and fdata. */

xdata[0] = 1.;

fdata[0] = F(xdata[0]);

for (i=1;i<NDATA;i++) {

xdata[i] = xdata[i-1]+.1;

fdata[i] = F(xdata[i]);

}

 

/* Contaminate the data. */

for (i=0;i<10;i++) fdata[isub[i]] += rnoise[i];

 

/* Smooth the data. */

sdata = imsl_f_smooth_1d_data(NDATA, xdata, fdata,

IMSL_DISTANCE, dis,

IMSL_STOPPING_CRITERION, sc,

IMSL_ITMAX, maxit,

0);

 

/* Output the result. */

printf("Case A - No specific information available. \n");

printf(" F(X) F(X)+noise sdata\n");

 

for (i=0;i<10;i++) printf("%7.3f\t%15.3f\t%15.3f\n",

F(xdata[isub[i]]),

fdata[isub[i]],

sdata[isub[i]]);

 

/* Example 2: No specific information is available. */

dis = 1.0;

sc = 0.0;

maxit = 10;

 

/*

* A warning message is produced because the maximum

* number of iterations is reached.

*/

 

/* Smooth the data. */

sdata = imsl_f_smooth_1d_data(NDATA, xdata, fdata,

IMSL_DISTANCE, dis,

IMSL_STOPPING_CRITERION, sc,

IMSL_ITMAX, maxit,

IMSL_RETURN_USER, s_user,

0);

 

/* Output the result. */

printf("Case B - Specific information available. \n");

printf(" F(X) F(X)+noise sdata\n");

 

for (i=0;i<10;i++) printf("%7.3f\t%15.3f\t%15.3f\n",

F(xdata[isub[i]]),

fdata[isub[i]],

s_user[isub[i]]);

}

Output

 

Case A - No specific information available.

F(X) F(X)+noise sdata

9.830 12.330 9.870

8.263 5.263 8.215

5.201 3.201 5.168

2.223 4.723 2.264

1.259 4.259 1.308

3.167 1.167 3.138

7.167 4.667 7.131

10.880 12.880 10.909

12.774 10.774 12.708

7.594 10.594 7.639

 

*** WARNING Error IMSL_ITMAX_EXCEEDED from imsl_f_smooth_1d_data.

*** Maximum number of iterations limit "itmax" = 10 exceeded.

*** The best answer found is returned.

 

Case B - Specific information available.

F(X) F(X)+noise sdata

9.830 12.330 9.831

8.263 5.263 8.262

5.201 3.201 5.199

2.223 4.723 2.225

1.259 4.259 1.261

3.167 1.167 3.170

7.167 4.667 7.170

10.880 12.880 10.878

12.774 10.774 12.770

7.594 10.594 7.592