difference¶

Differences a seasonal or nonseasonal time series.

Synopsis¶

difference (z, periods)

Required Arguments¶

float z[] (Input): Array of length nObservations containing the time series.
int periods[] (Input): Array of length nDifferences containing the periods at which z is to be differenced.

Return Value¶

An array of length nObservations containing the differenced series.

Optional Arguments¶

orders, int[] (Input): Array of length nDifferences containing the order of each difference given in periods. The elements of orders must be greater than or equal to 0.
lost (Output): Number of observations lost because of differencing the time series z.

excludeFirst (Input)

or

setFirstToNan (Input): If excludeFirst is specified, the first lost are excluded from w due to differencing. The differenced series w is of length nObservations - lost. If setFirstToNan is specified, the first lost observations are set to NaN (Not a Number). This is the default if neither excludeFirst nor setFirstToNan is specified.

Description¶

Function difference performs m = nDifferences successive backward differences of period $s_i$ = periods [i - 1] and order $d_i$ = orders [i - 1] for $i=1,\ldots,m$ on the n = nObservations observations $\{Z_t\}$ for $t=1,2,\ldots,n$ .

Consider the backward shift operator B given by

$B^k Z_t = Z_{t-k}$

for all k. Then, the backward difference operator with period s is defined by the following:

$\mathit{\Delta}_s Z_t = \left(1 - B^s\right) Z_t = Z_t - Z_{t-s} \text{ for } s > 0.$

Note that $B^s Z_t$ and $\Delta^s Z_t$ are defined only for $t=(s+1),\ldots,n$ . Repeated differencing with period s is simply

$\mathit{\Delta}_s^d Z_t = \left(1 - B^s\right)^d Z_t = \sum_{j=0}^{d} \frac{d!}{j!(d-j)!}(-1)^j B^{sj} Z_t$

where d ≥ 0 is the order of differencing. Note that

$\mathit{\Delta}_s^d Z_t$

is defined only for $t=(sd+1),\ldots,n$ .

The general difference formula used in the function difference is given by

$\begin{split}W_t = \begin{cases} \mathrm{NaN} & \text{for } t = 1, \ldots n_L \\ \mathit{\Delta}_{s_1}^{d_1} \mathit{\Delta}_{s_2}^{d_2} \ldots \mathit{\Delta}_{s_m}^{d_m} Z_t & \text{for } t= n_L + 1, \ldots n \\ \end{cases}\end{split}$

where $n_L$ represents the number of observations “lost” because of differencing and NaN represents the missing value code. See the function machine to retrieve missing values. Note that

$n_L = \sum_j s_j d_j$

A homogeneous, stationary time series can be arrived at by appropriately differencing a homogeneous, nonstationary time series (Box and Jenkins 1976, p. 85). Preliminary application of an appropriate transformation followed by differencing of a series can enable model identification and parameter estimation in the class of homogeneous stationary autoregressive moving average models.

Examples¶

Example 1¶

Consider the Airline Data (Box and Jenkins 1976, p. 531) consisting of the monthly total number of international airline passengers from January 1949 through December 1960. Function difference is used to compute

$W_t = \mathit{\Delta}_1 \mathit{\Delta}_{12} Z_t = \left(Z_t - Z_{t-12}\right) - \left(Z_{t-1} - Z_{t-13}\right)$

for $t=14,15,\ldots,24$ .

from __future__ import print_function
from numpy import *
from pyimsl.stat.dataSets import dataSets
from pyimsl.stat.difference import difference

# Get airline data
z = dataSets(4)
n_observations = 24

periods = array([1, 12])
diff = difference(z.flat, periods)

# Print the number of lost observations
print("i\tz[i]\t\tdiff[i]")
for i in range(0, n_observations):
    print("%d\t%f\t%f" % (i, z[i], diff[i]))

Output¶

i	z[i]		diff[i]
112.000000	nan
118.000000	nan
132.000000	nan
129.000000	nan
121.000000	nan
135.000000	nan
148.000000	nan
148.000000	nan
136.000000	nan
119.000000	nan
104.000000	nan
118.000000	nan
115.000000	nan
126.000000	5.000000
141.000000	1.000000
135.000000	-3.000000
125.000000	-2.000000
149.000000	10.000000
170.000000	8.000000
170.000000	0.000000
158.000000	0.000000
133.000000	-8.000000
114.000000	-4.000000
140.000000	12.000000

Example 2¶

The data for this example is the same as that for the initial example. The first lost observations are excluded from W due to differencing, and lost is also output.

from __future__ import print_function
from numpy import *
from pyimsl.stat.dataSets import dataSets
from pyimsl.stat.difference import difference

n_observations = 24
n_differences = 2
periods = (1, 12)
n_lost = empty(0)

# Get airline data
z = dataSets(4)

# Compute differenced time series when observations
# lost are excluded from the differencing
diff = difference(z.flat, periods,
                  excludeFirst=True,
                  lost=n_lost)

# Print the number of lost observations
print("n_lost equals %d" % n_lost)
print("\ni\tz[i]\t        difference[i]")

# Print the original time series and the differenced time series
for i in range(0, n_observations - int(n_lost)):
    print("%d\t%f\t%f" % (i, z[i], diff[i]))

Output¶

n_lost equals 13

i	z[i]	        difference[i]
112.000000	5.000000
118.000000	1.000000
132.000000	-3.000000
129.000000	-2.000000
121.000000	10.000000
135.000000	8.000000
148.000000	0.000000
148.000000	0.000000
136.000000	-8.000000
119.000000	-4.000000
104.000000	12.000000

Fatal Errors¶

`IMSLS_PERIODS_LT_ZERO`	“`period`[#]” = #. All elements of “`period`” must be greater than 0.
`IMSLS_ORDER_NEGATIVE`	“`order`[#]” = #. All elements of “`order`” must be nonnegative.
`IMSLS_Z_CONTAINS_NAN`	“`z`[#]” = NaN; “`z`” can not contain missing values. There may be other elements of “`z`” that are equal to NaN.