difference

Differences a seasonal or nonseasonal time series.

Synopsis

difference (z, periods)

Required Arguments

float z[] (Input)
Array of length nObservations containing the time series.
int periods[] (Input)
Array of length nDifferences containing the periods at which z is to be differenced.

Return Value

An array of length nObservations containing the differenced series.

Optional Arguments

orders, int[] (Input)
Array of length nDifferences containing the order of each difference given in periods. The elements of orders must be greater than or equal to 0.
lost (Output)
Number of observations lost because of differencing the time series z.

excludeFirst (Input)

or

setFirstToNan (Input)
If excludeFirst is specified, the first lost are excluded from w due to differencing. The differenced series w is of length nObservations - lost. If setFirstToNan is specified, the first lost observations are set to NaN (Not a Number). This is the default if neither excludeFirst nor setFirstToNan is specified.

Description

Function difference performs m = nDifferences successive backward differences of period \(s_i\) = periods [i - 1] and order \(d_i\) = orders [i - 1] for \(i=1,\ldots,m\) on the n = nObservations observations \(\{Z_t\}\) for \(t=1,2,\ldots,n\).

Consider the backward shift operator B given by

\[B^k Z_t = Z_{t-k}\]

for all k. Then, the backward difference operator with period s is defined by the following:

\[\mathit{\Delta}_s Z_t = \left(1 - B^s\right) Z_t = Z_t - Z_{t-s} \text{ for } s > 0.\]

Note that \(B^s Z_t\) and \(\Delta^s Z_t\) are defined only for \(t=(s+1),\ldots,n\). Repeated differencing with period s is simply

\[\mathit{\Delta}_s^d Z_t = \left(1 - B^s\right)^d Z_t = \sum_{j=0}^{d} \frac{d!}{j!(d-j)!}(-1)^j B^{sj} Z_t\]

where d ≥ 0 is the order of differencing. Note that

\[\mathit{\Delta}_s^d Z_t\]

is defined only for \(t=(sd+1),\ldots,n\).

The general difference formula used in the function difference is given by

\[\begin{split}W_t = \begin{cases} \mathrm{NaN} & \text{for } t = 1, \ldots n_L \\ \mathit{\Delta}_{s_1}^{d_1} \mathit{\Delta}_{s_2}^{d_2} \ldots \mathit{\Delta}_{s_m}^{d_m} Z_t & \text{for } t= n_L + 1, \ldots n \\ \end{cases}\end{split}\]

where \(n_L\) represents the number of observations “lost” because of differencing and NaN represents the missing value code. See the function machine to retrieve missing values. Note that

\[n_L = \sum_j s_j d_j\]

A homogeneous, stationary time series can be arrived at by appropriately differencing a homogeneous, nonstationary time series (Box and Jenkins 1976, p. 85). Preliminary application of an appropriate transformation followed by differencing of a series can enable model identification and parameter estimation in the class of homogeneous stationary autoregressive moving average models.

Examples

Example 1

Consider the Airline Data (Box and Jenkins 1976, p. 531) consisting of the monthly total number of international airline passengers from January 1949 through December 1960. Function difference is used to compute

\[W_t = \mathit{\Delta}_1 \mathit{\Delta}_{12} Z_t = \left(Z_t - Z_{t-12}\right) - \left(Z_{t-1} - Z_{t-13}\right)\]

for \(t=14,15,\ldots,24\).

from __future__ import print_function
from numpy import *
from pyimsl.stat.dataSets import dataSets
from pyimsl.stat.difference import difference

# Get airline data
z = dataSets(4)
n_observations = 24

periods = array([1, 12])
diff = difference(z.flat, periods)

# Print the number of lost observations
print("i\tz[i]\t\tdiff[i]")
for i in range(0, n_observations):
    print("%d\t%f\t%f" % (i, z[i], diff[i]))

Output

i	z[i]		diff[i]
0	112.000000	nan
1	118.000000	nan
2	132.000000	nan
3	129.000000	nan
4	121.000000	nan
5	135.000000	nan
6	148.000000	nan
7	148.000000	nan
8	136.000000	nan
9	119.000000	nan
10	104.000000	nan
11	118.000000	nan
12	115.000000	nan
13	126.000000	5.000000
14	141.000000	1.000000
15	135.000000	-3.000000
16	125.000000	-2.000000
17	149.000000	10.000000
18	170.000000	8.000000
19	170.000000	0.000000
20	158.000000	0.000000
21	133.000000	-8.000000
22	114.000000	-4.000000
23	140.000000	12.000000

Example 2

The data for this example is the same as that for the initial example. The first lost observations are excluded from W due to differencing, and lost is also output.

from __future__ import print_function
from numpy import *
from pyimsl.stat.dataSets import dataSets
from pyimsl.stat.difference import difference

n_observations = 24
n_differences = 2
periods = (1, 12)
n_lost = empty(0)

# Get airline data
z = dataSets(4)

# Compute differenced time series when observations
# lost are excluded from the differencing
diff = difference(z.flat, periods,
                  excludeFirst=True,
                  lost=n_lost)

# Print the number of lost observations
print("n_lost equals %d" % n_lost)
print("\ni\tz[i]\t        difference[i]")

# Print the original time series and the differenced time series
for i in range(0, n_observations - int(n_lost)):
    print("%d\t%f\t%f" % (i, z[i], diff[i]))

Output

n_lost equals 13

i	z[i]	        difference[i]
0	112.000000	5.000000
1	118.000000	1.000000
2	132.000000	-3.000000
3	129.000000	-2.000000
4	121.000000	10.000000
5	135.000000	8.000000
6	148.000000	0.000000
7	148.000000	0.000000
8	136.000000	-8.000000
9	119.000000	-4.000000
10	104.000000	12.000000

Fatal Errors

IMSLS_PERIODS_LT_ZERO period[#]” = #. All elements of “period” must be greater than 0.
IMSLS_ORDER_NEGATIVE order[#]” = #. All elements of “order” must be nonnegative.
IMSLS_Z_CONTAINS_NAN z[#]” = NaN; “z” can not contain missing values. There may be other elements of “z” that are equal to NaN.