Scales or unscales continuous data prior to its use in neural network training, testing, or forecasting.
#include <imsls.h>
float
*
imsls_f_scale_filter (int
n_obs, float x[], int
method,
…,0)
The type double function is imsls_d_scale_filter.
int n_obs
(Input)
Number of observations.
float x[]
(Input)
An array of length n_obs. The
values in x are
either the scaled or unscaled values of a continuous variable.
Missing values are allowed, and are indicated by placing a NaN (not a number) in
x. See imsls_f_machine(6).
int method
(Input)
The scaling method to apply to each variable. The association
of the value in method and the scaling
algorithm is summarized in the table below. The sign of method determines
whether the values in x are scaled or
unscaled. If method is positive
then values in x
are scaled. If method is negative
then values in x
are unscaled.
Method |
Algorithm |
0 |
No scaling. |
±1 |
Bounded scaling and unscaling. |
±2 |
Unbounded z-score scaling using the mean and standard deviation. |
±3 |
Unbounded z-score scaling using the median and mean absolute difference. |
±4 |
Bounded z-score scaling using the mean and standard deviation. |
±5 |
Bounded z-score scaling using the median mean absolute difference. |
A pointer to an internally allocated array of length n_obs containing either the scaled or unscaled value of x, depending upon whether method is positive or negative, respectively. If errors are encountered, NULL is returned.
#include <imsls.h>
float *
imsls_f_scale_filter
(int
n_obs, float x[], int
method,
IMSLS_RETURN_USER, float z[],
IMSLS_SCALE_LIMITS, float real_min, float real_max,
float target_min, float target_max,
IMSLS_SUPPLY_CENTER_SPREAD, float center, float spread,
IMSLS_RETURN_CENTER_SPREAD, float *center,
float *spread,
0)
IMSLS_RETURN_USER, float z[]
(Output)
A user-supplied array of length n_obs containing either the scaled or
unscaled values of x, depending upon
whether method
is positive or negative, respectively.
IMSLS_SCALE_LIMITS, float real_min, float real_max, float target_min, float
target_max (Input)
The real and target limits for x. This optional
argument is required when bounded scaling is performed, i.e., method=±1, ±4,
or ±5. real_min
is the lowest value expected for each input variable in x. real_max is the
largest value expected. target_min is lowest
value allowed for the output variable, z. target_max is the
largest value allowed for the output variable.
IMSLS_SUPPLY_CENTER_SPREAD, float center, float spread
(Input)
The values center and spread are only
used for z-score scaling or unscaling of x, that is, when method is one of ±2,
±3, ±4, and ±5. The value of center is either the
mean or median, and the value of spread is either
the standard deviation or mean absolute difference. When method is positive,
this optional argument can be used to supply a user-defined center and spread
rather than allowing imsls_f_scale_filter
to compute the center and spread from the data in x. When method is one of
-2, -3, -4, or -5, this optional argument must be used to supply the center and
spread used during scaling.
IMSLS_RETURN_CENTER_SPREAD, float *center, float *spread
(Output)
Pointers to scalars containing the computed center and spread of
x. The
values center
and spread
are only used for z-score scaling or unscaling of x. These
methods, ±2, ±3, ±4, and ±5, require two numbers, either the mean or median, and
either the standard deviation, or mean absolute difference. The value
of center
is either the mean or median for x. The value
of spread
is either the standard deviation or mean absolute difference.
The function imsls_f_scale_filter is designed to either scale or unscale a continuous variable using one of four methods prior to their use as neural network input or output.
The specific encoding computations employed are specified by argument method. Scaling limits are supplied with the optional argument IMSLS_SCALE_LIMITS, and are required for the bounded scaling methods, i.e., method=±1, ±4, or ±5. Bounded scaling ensures that the scaled values in the returned array fall between a lower and upper bound.
If method=1 then the bounded method of scaling and unscaling is applied to x using the scaling limits in scale_limit.
If method=±2, ±3, ±4, or ±5, then the z-score method of scaling is used. These calculations are based upon the following scaling calculation:
,
where a is a measure of center for x, and b is a measure of the spread of x.
If method=±2 or ±4, then by default a and b are the arithmetic average and sample standard deviation of the training data. These values can be overridden using the optional argument IMSLS_SUPPLY_CENTER_SPREAD.
If method=±3 or ±5, then by default a and b are the median and , where is a robust estimate of the population standard deviation:
, where MAD is the Mean Absolute Deviation
.
Again, the values of a and b can be overridden using the optional argument IMSLS_SUPPLY_CENTER_SPREAD.
If method=1, then the optional argument IMSLS_SCALE_LIMITS is required and a scaling operation is conducted using the scale limits for x using the following calculation:
,
where
.
If method=-1, then optional argument IMSLS_SCALE_LIMITS is required and an unscaling operation is conducted by inverting the following calculation:
.
If method=2 or method=3, then a scaling operation is conducted using the scale limits of x using a z-score calculation:
,
If either
center
or spread
are missing, (a NaN), then appropriate values are calculated from the
non-missing values of x.
If method=2,
then center
is set equal to the arithmetic average , and spread
is set equal to the sample standard deviation,
.
If method=3, then center is set equal to the median , and center is set equal to the Mean Absolute Difference (MAD).
If method=-2 or method=-3, then an unscaling operation is conducted using the inverse calculation for the equation shown in the above section, “Method +2 or +3: Unbounded z-score Scaling.”
.
For these values of method, missing values for center and spread are not allowed. If method=-2, then center and spread are assumed to be equal to the arithmetic average and standard deviation, respectively. These values would normally be the same used in scaling the variable with method=+2. If method= -3, then center and spread are assumed to be equal to the median and mean absolute difference, respectively. These values would normally be the same used in scaling the variable with method=+3.
This method is essentially the same as the z-score calculation described for method=+2 and method=+3 with additional scaling or unscaling using the scale limits. If method=4, then the optional argument IMSLS_SCALE_LIMITS is required and a scaling operation is conducted using the scale limits for x using the widely known z-score calculation:
.
If either center or spread are missing, (a NaN), then appropriate values are calculated from the non-missing values in x. If center is missing and method=+4, then center is set equal to the arithmetic average , and spread is set equal to the Sample Standard Deviation, . If center is missing and method=+5, then x_stats[i] is set equal to the median , and spread is set equal to the MAD.
In bounded scaling, if z[i] exceeds its bounds, it is set to the boundary it exceeded.
If method=-4 or method=-5, then the optional argument IMSLS_SCALE_LIMITS is required and an unscaling operation is conducted using the inverse calculation for the equation below.
For these values of method, missing values for center and spread are not allowed. If method=-4, then center and spread are assumed to be equal to the arithemetic average and standard deviation, respectively. These values would normally be the same used in scaling x with method=+4. If method=-5, then center and spread are assumed to be equal to the median and mean absolute difference, respectively. These values would normally be the same used in scaling the x with method=+5.
In this example two data sets are filtered using bounded z-score scaling.
#include <imsls.h>
void main()
{
int n_obs=5;
float x1[] = {3.5, 2.4, 4.4, 5.6, 1.1};
float x2[] = {3.1, 1.5, - 1.5, 2.4, 4.2};
float *z1, *z2;
float *y1, *y2;
float center1, spread1;
float center2, spread2;
z1 = imsls_f_scale_filter(n_obs, x1, 4,
IMSLS_SCALE_LIMITS, -6.0, 6.0, -3.0, 3.0,
IMSLS_RETURN_CENTER_SPREAD, ¢er1, &spread1,
0);
z2 = imsls_f_scale_filter(n_obs, x2, 5,
IMSLS_SCALE_LIMITS, -3.0, 3.0, -3.0, 3.0,
IMSLS_RETURN_CENTER_SPREAD, ¢er2, &spread2,
0);
imsls_f_write_matrix("z1", n_obs, 1, z1, 0);
printf("Center = %f\nSpread = %f\n", center1, spread1);
imsls_f_write_matrix("z2", n_obs, 1, z2, 0);
printf("Center = %f\nSpread = %f\n", center2, spread2);
/* Un-scale z1 and z2. */
y1 = imsls_f_scale_filter(n_obs, z1, -4,
IMSLS_SCALE_LIMITS, -6.0, 6.0, -3.0, 3.0,
IMSLS_SUPPLY_CENTER_SPREAD, center1, spread1,
0);
y2 = imsls_f_scale_filter(n_obs, z2, -5,
IMSLS_SCALE_LIMITS, -3.0, 3.0, -3.0, 3.0,
IMSLS_SUPPLY_CENTER_SPREAD, center2, spread2,
0);
imsls_f_write_matrix("y1", n_obs, 1, y1, 0);
imsls_f_write_matrix("y2", n_obs, 1, y2, 0);
}
z1
1 0.0287
2 -0.2870
3 0.2870
4 0.6314
5 -0.6601
Center = 3.400000
Spread = 1.742125
z2
1 0.525
2 -0.674
3 -2.923
4 0.000
5 1.349
Center = 2.400000
Spread = 1.334342
y1
1 3.5
2 2.4
3 4.4
4 5.6
5 1.1
y2
1 3.1
2 1.5
3 -1.5
4 2.4
5 4.2
Visual Numerics, Inc. PHONE: 713.784.3131 FAX:713.781.9260 |