difference¶
Differences a seasonal or nonseasonal time series.
Synopsis¶
difference (z, periods)
Required Arguments¶
- float
z[]
(Input) - Array of length
nObservations
containing the time series. - int
periods[]
(Input) - Array of length
nDifferences
containing the periods at whichz
is to be differenced.
Return Value¶
An array of length nObservations
containing the differenced series.
Optional Arguments¶
orders
, int[]
(Input)- Array of length
nDifferences
containing the order of each difference given in periods. The elements of orders must be greater than or equal to 0. lost
(Output)- Number of observations lost because of differencing the time series
z
.
excludeFirst
(Input)
or
setFirstToNan
(Input)- If
excludeFirst
is specified, the firstlost
are excluded fromw
due to differencing. The differenced seriesw
is of lengthnObservations
-lost
. IfsetFirstToNan
is specified, the firstlost
observations are set to NaN (Not a Number). This is the default if neitherexcludeFirst
norsetFirstToNan
is specified.
Description¶
Function difference
performs m = nDifferences
successive backward
differences of period \(s_i\) = periods
[i - 1] and order
\(d_i\) = orders
[i - 1] for \(i=1,\ldots,m\) on the n =
nObservations
observations \(\{Z_t\}\) for \(t=1,2,\ldots,n\).
Consider the backward shift operator B given by
for all k. Then, the backward difference operator with period s is defined by the following:
Note that \(B^s Z_t\) and \(\Delta^s Z_t\) are defined only for \(t=(s+1),\ldots,n\). Repeated differencing with period s is simply
where d ≥ 0 is the order of differencing. Note that
is defined only for \(t=(sd+1),\ldots,n\).
The general difference formula used in the function difference
is given
by
where \(n_L\) represents the number of observations “lost” because of differencing and NaN represents the missing value code. See the function machine to retrieve missing values. Note that
A homogeneous, stationary time series can be arrived at by appropriately differencing a homogeneous, nonstationary time series (Box and Jenkins 1976, p. 85). Preliminary application of an appropriate transformation followed by differencing of a series can enable model identification and parameter estimation in the class of homogeneous stationary autoregressive moving average models.
Examples¶
Example 1¶
Consider the Airline Data (Box and Jenkins 1976, p. 531) consisting of the
monthly total number of international airline passengers from January 1949
through December 1960. Function difference
is used to compute
for \(t=14,15,\ldots,24\).
from __future__ import print_function
from numpy import *
from pyimsl.stat.dataSets import dataSets
from pyimsl.stat.difference import difference
# Get airline data
z = dataSets(4)
n_observations = 24
periods = array([1, 12])
diff = difference(z.flat, periods)
# Print the number of lost observations
print("i\tz[i]\t\tdiff[i]")
for i in range(0, n_observations):
print("%d\t%f\t%f" % (i, z[i], diff[i]))
Output¶
i z[i] diff[i]
0 112.000000 nan
1 118.000000 nan
2 132.000000 nan
3 129.000000 nan
4 121.000000 nan
5 135.000000 nan
6 148.000000 nan
7 148.000000 nan
8 136.000000 nan
9 119.000000 nan
10 104.000000 nan
11 118.000000 nan
12 115.000000 nan
13 126.000000 5.000000
14 141.000000 1.000000
15 135.000000 -3.000000
16 125.000000 -2.000000
17 149.000000 10.000000
18 170.000000 8.000000
19 170.000000 0.000000
20 158.000000 0.000000
21 133.000000 -8.000000
22 114.000000 -4.000000
23 140.000000 12.000000
Example 2¶
The data for this example is the same as that for the initial example. The
first lost
observations are excluded from W due to differencing, and
lost
is also output.
from __future__ import print_function
from numpy import *
from pyimsl.stat.dataSets import dataSets
from pyimsl.stat.difference import difference
n_observations = 24
n_differences = 2
periods = (1, 12)
n_lost = empty(0)
# Get airline data
z = dataSets(4)
# Compute differenced time series when observations
# lost are excluded from the differencing
diff = difference(z.flat, periods,
excludeFirst=True,
lost=n_lost)
# Print the number of lost observations
print("n_lost equals %d" % n_lost)
print("\ni\tz[i]\t difference[i]")
# Print the original time series and the differenced time series
for i in range(0, n_observations - int(n_lost)):
print("%d\t%f\t%f" % (i, z[i], diff[i]))
Output¶
n_lost equals 13
i z[i] difference[i]
0 112.000000 5.000000
1 118.000000 1.000000
2 132.000000 -3.000000
3 129.000000 -2.000000
4 121.000000 10.000000
5 135.000000 8.000000
6 148.000000 0.000000
7 148.000000 0.000000
8 136.000000 -8.000000
9 119.000000 -4.000000
10 104.000000 12.000000
Fatal Errors¶
IMSLS_PERIODS_LT_ZERO |
“period [#]” = #. All elements of
“period ” must be greater than 0. |
IMSLS_ORDER_NEGATIVE |
“order [#]” = #. All elements of
“order ” must be nonnegative. |
IMSLS_Z_CONTAINS_NAN |
“z [#]” = NaN; “z ” can not
contain missing values. There may be
other elements of “z ” that are
equal to NaN. |