scaleFilter¶
Scales or unscales continuous data prior to its use in neural network training, testing, or forecasting.
Synopsis¶
scaleFilter (x, method)
Required Arguments¶
- float
x[]
(Input) - An array of length
nPatterns
. The values inx
are either the scaled or unscaled values of a continuous variable. Missing values are allowed, and are indicated by placing a NaN (not a number) inx
. See machine(6). - int
method
(Input) - The scaling method to apply to each variable. The association of the
value in
method
and the scaling algorithm is summarized in the table below. The sign ofmethod
determines whether the values inx
are scaled or unscaled. Ifmethod
is positive then values inx
are scaled. Ifmethod
is negative then values inx
are unscaled.
method |
Algorithm |
---|---|
0 | No scaling. |
±1 | Bounded scaling and unscaling. |
±2 | Unbounded z-score scaling using the mean and standard deviation. |
±3 | Unbounded z-score scaling using the median and mean absolute difference. |
±4 | Bounded z-score scaling using the mean and standard deviation. |
±5 | Bounded z-score scaling using the median mean absolute difference. |
Return Value¶
An array of length nPatterns
containing either the scaled or unscaled
value of x
, depending upon whether method
is positive or negative,
respectively. If errors are encountered, None
is returned.
Optional Arguments¶
scaleLimits
, floatrealMin
, floatrealMax
, floattargetMin
, floattargetMax
(Input)- The real and target limits for
x
. This optional argument is required when bounded scaling is performed, i.e.,method
=±1, ±4, or ±5.realMin
is the lowest value expected for each input variable inx
.realMax
is the largest value expected.targetMin
is lowest value allowed for the output variable,z
.targetMax
is the largest value allowed for the output variable. supplyCenterSpread
, floatcenter
, floatspread
(Input)- The values
center
andspread
are only used for z-score scaling or unscaling ofx
, that is, whenmethod
is one of ±2, ±3, ±4, and ±5. The value ofcenter
is either the mean or median, and the value ofspread
is either the standard deviation or mean absolute difference. Whenmethod
is positive, this optional argument can be used to supply a user-defined center and spread rather than allowingscaleFilter
to compute the center and spread from the data inx
. Whenmethod
is one of -2, -3, -4, or -5, this optional argument must be used to supply the center and spread used during scaling. returnCenterSpread
,center
,spread
(Output)- Pointers to scalars containing the computed center and spread of
x
. The valuescenter
andspread
are only used for z-score scaling or unscaling ofx
. These methods, ±2, ±3, ±4, and ±5, require two numbers, either the mean or median, and either the standard deviation, or mean absolute difference. The value ofcenter
is either the mean or median forx
. The value ofspread
is either the standard deviation or mean absolute difference.
Description¶
The function scaleFilter
is designed to either scale or unscale a
continuous variable using one of four methods prior to their use as neural
network input or output.
The specific encoding computations employed are specified by argument
method
. Scaling limits are supplied with the optional argument
scaleLimits
, and are required for the bounded scaling methods, i.e.,
method
=±1, ±4, or ±5. Bounded scaling ensures that the scaled values
in the returned array fall between a lower and upper bound.
If method
=1 then the bounded method of scaling and unscaling is
applied to x
using the scaling limits in scaleLimits
.
If method
=±2, ±3, ±4, or ±5, then the z-score method of scaling is
used. These calculations are based upon the following scaling calculation:
where a is a measure of center for x
, and b is a measure of the spread
of x
.
If method
=±2 or ±4, then by default a and b are the arithmetic
average and sample standard deviation of the training data. These values can
be overridden using the optional argument supplyCenterSpread
.
If method
=±3 or ±5, then by default a and b are the median and
\(\tilde{s}\), where \(\tilde{s}\) is a robust estimate of the
population standard deviation:
where MAD is the Mean Absolute Deviation
Again, the values of a and b can be overridden using the optional
argument supplyCenterSpread
.
Method ±1: Bounded Scaling and Unscaling¶
If method
=1, then the optional argument scaleLimits
is required
and a scaling operation is conducted using the scale limits for x
using
the following calculation:
where
If method
=-1, then optional argument scaleLimits
is required and
an unscaling operation is conducted by inverting the following calculation:
Method +2 or +3: Unbounded z-score Scaling¶
If method
=2 or method
=3, then a scaling operation is conducted
using the scale limits of x
using a z-score calculation:
If either center
or spread
are missing, (a NaN), then appropriate
values are calculated from the non-missing values of x
. If
method
=2, then center
is set equal to the arithmetic average
\(\bar{x}\), and spread
is set equal to the sample standard
deviation, \(s\).
If method
=3, then center
is set equal to the median
\(\tilde{m}\), and spread
is set equal to the Mean Absolute
Difference (MAD).
Method -2 or -3: Unbounded z-score Unscaling¶
If method
=-2 or method
=-3, then an unscaling operation is
conducted using the inverse calculation for the equation shown in the above
section, “Method +2 or +3: Unbounded z-score Scaling.”
For these values of method
, missing values for center
and spread
are not allowed. If method
=-2, then center
and spread
are
assumed to be equal to the arithmetic average and standard deviation,
respectively. These values would normally be the same used in scaling the
variable with method
=+2. If method
= -3, then center
and
spread
are assumed to be equal to the median and mean absolute
difference, respectively. These values would normally be the same used in
scaling the variable with method
=+3.
Method +4 or +5: Bounded z-score Scaling¶
This method is essentially the same as the z-score calculation described for
method
=+2 and method
=+3 with additional scaling or unscaling
using the scale limits. If method
=4, then the optional argument
scaleLimits
is required and a scaling operation is conducted using the
scale limits for x
using the widely known z-score calculation:
If either center
or spread
are missing, (a NaN), then appropriate
values are calculated from the non-missing values in x
. If center
is
missing and method
=+4, then center
is set equal to the arithmetic
average \(\bar{x}\), and spread
is set equal to the Sample Standard
Deviation, \(s\). If center
is missing and method
=+5, then
center
is set equal to the median \(\tilde{m}\), and spread
is
set equal to the MAD.
In bounded scaling, if z
[i] exceeds its bounds, it is set to the
boundary it exceeded.
Method -4 or -5: Bounded z-score unscaling¶
If method
=-4 or method
=-5, then the optional argument
scaleLimits
is required and an unscaling operation is conducted using
the inverse calculation for the equation below.
For these values of method
, missing values for center
and spread
are not allowed. If method
=-4, then center
and spread
are
assumed to be equal to the arithmetic average and standard deviation,
respectively. These values would normally be the same used in scaling x
with method
=+4. If method
=-5, then center
and spread
are
assumed to be equal to the median and mean absolute difference,
respectively. These values would normally be the same used in scaling the
x
with method
=+5.
Example¶
In this example two data sets are filtered using bounded z-score scaling.
from __future__ import print_function
from numpy import *
from pyimsl.stat.scaleFilter import scaleFilter
from pyimsl.stat.writeMatrix import writeMatrix
x1 = [3.5, 2.4, 4.4, 5.6, 1.1]
x2 = [3.1, 1.5, -1.5, 2.4, 4.2]
centerSpread1 = {}
centerSpread2 = {}
z1 = scaleFilter(x1, 4, scaleLimits={'realMin': -6., 'realMax': 6., 'targetMin': -3., 'targetMax': 3.},
returnCenterSpread=centerSpread1)
z2 = scaleFilter(x2, 5, scaleLimits={'realMin': -3., 'realMax': 3., 'targetMin': -3., 'targetMax': 3.},
returnCenterSpread=centerSpread2)
writeMatrix("z1", z1, column=True)
print("Center = %10.6f\nSpread = %10.6f" %
(centerSpread1['center'], centerSpread1['spread']))
writeMatrix("z2", z2, column=True)
print("Center = %10.6f\nSpread = %10.6f" %
(centerSpread2['center'], centerSpread2['spread']))
# un-scale z1 and z2.
y1 = scaleFilter(z1, -4, scaleLimits={'realMin': -6., 'realMax': 6., 'targetMin': -3., 'targetMax': 3.},
supplyCenterSpread=centerSpread1)
y2 = scaleFilter(z2, -5, scaleLimits={'realMin': -3., 'realMax': 3., 'targetMin': -3., 'targetMax': 3.},
supplyCenterSpread=centerSpread2)
writeMatrix("y1", y1, column=True)
writeMatrix("y2", y2, column=True)
Output¶
Center = 3.400000
Spread = 1.742125
Center = 2.400000
Spread = 1.334342
z1
1 0.0287
2 -0.2870
3 0.2870
4 0.6314
5 -0.6601
z2
1 0.525
2 -0.674
3 -2.923
4 0.000
5 1.349
y1
1 3.5
2 2.4
3 4.4
4 5.6
5 1.1
y2
1 3.1
2 1.5
3 -1.5
4 2.4
5 4.2