com.imsl.stat (JMSL Numerical Library (jmsl) 2021.0.0 API)

Interface Summary
Interface	Description
CdfFunction	Public interface for the user-supplied cumulative distribution function to be used by InverseCdf and ChiSquaredTest.
Distribution	Public interface for the user-supplied distribution function.
ExtendedGARCH.zDistribution	Public interface for specifying the distribution of $z_t$ .
NonlinearRegression.Derivative	Public interface for the user supplied function to compute the derivative for `NonlinearRegression`.
NonlinearRegression.Function	Public interface for the user supplied function for `NonlinearRegression`.
ProbabilityDistribution	Public interface for a user-supplied probability distribution.
Random.BaseGenerator	Base pseudorandom number.
RandomSequence	Interface implemented by generators of random or quasi-random multidimensional sequences.
RegressionBasis	Public interface for user supplied function to `UserBasisRegression` object.
TimeSeriesOperations.Function	Public interface for the user-supplied function that defines how to combine two synchronous time series values.

Class Summary
Class	Description
ANCOVA	Analyzes a one-way classification model with covariates.
ANOVA	Analysis of Variance table and related statistics.
ANOVAFactorial	Analyzes a balanced factorial design with fixed effects.
ARAutoUnivariate	Automatically determines the best autoregressive time series model using Akaike's Information Criterion.
ARAutoUnivariate.Formatter	Deprecated Use `IMSLFormatter` instead.
ARMA	Computes least-square estimates of parameters for an ARMA model.
ARMAEstimateMissing	Estimates missing values in a time series collected with equal spacing.
ARMAMaxLikelihood	Computes maximum likelihood estimates of parameters for an ARMA model with p and q autoregressive and moving average terms respectively.
ARMAOutlierIdentification	Detects and determines outliers and simultaneously estimates the model parameters in a time series whose underlying outlier free series follows a general seasonal or nonseasonal ARMA model.
ARSeasonalFit	Estimates the optimum seasonality parameters for a time series using an autoregressive model, AR(p), to represent the time series.
AutoARIMA	Automatically identifies time series outliers, determines parameters of a multiplicative seasonal $\text{ARIMA}(p,0,q)\times(0,d,0)_s$ model and produces forecasts that incorporate the effects of outliers whose effects persist beyond the end of the series.
AutoCorrelation	Computes the sample autocorrelation function of a stationary time series.
CategoricalGenLinModel	Analyzes categorical data using logistic, probit, Poisson, and other linear models.
Cdf	Cumulative probability distribution functions.
ChiSquaredTest	Chi-squared goodness-of-fit test.
ClusterHierarchical	Performs a hierarchical cluster analysis from a distance matrix.
ClusterKMeans	Perform a K-means (centroid) cluster analysis.
ClusterKNN	Perform a k-Nearest Neighbor classification.
ContingencyTable	Performs a chi-squared analysis of a two-way contingency table.
Covariances	Computes the sample variance-covariance or correlation matrix.
CrossCorrelation	Computes the sample cross-correlation function of two stationary time series.
Difference	Differences a seasonal or nonseasonal time series.
DiscriminantAnalysis	Performs a linear or a quadratic discriminant function analysis among several known groups.
Dissimilarities	Computes a matrix of dissimilarities (or similarities) between the columns (or rows) of a matrix.
EGARCH	Implements the exponential GARCH (EGARCH) model.
EmpiricalQuantiles	Computes empirical quantiles.
ExtendedGARCH	Abstract class for extended GARCH models.
FactorAnalysis	Performs Principal Component Analysis or Factor Analysis on a covariance or correlation matrix.
FaureSequence	Generates the low-discrepancy Faure sequence.
GammaDistribution	Evaluates a gamma probability density for a given set of data.
GARCH	Computes estimates of the parameters of a GARCH(p,q) model.
HoltWintersExponentialSmoothing	Calculates parameters and forecasts using the Holt-Winters Multiplicative or Additive forecasting method for seasonal data.
InvCdf	Inverse cumulative probability distribution functions.
InverseCdf	Inverse of user-supplied cumulative distribution function.
KalmanFilter	Performs Kalman filtering and evaluates the likelihood function for the state-space model.
KaplanMeierECDF	Computes the Kaplan-Meier reliability function estimates or the CDF based on failure data that may be multi-censored.
KaplanMeierEstimates	Computes Kaplan-Meier (or product-limit) estimates of survival probabilities for a sample of failure times that possibly contain right censoring.
KolmogorovOneSample	The class `KolmogorovOneSample` performs a Kolmogorov-Smirnov goodness-of-fit test in one sample.
KolmogorovTwoSample	Performs a Kolmogorov-Smirnov two-sample test.
LackOfFit	Performs lack-of-fit test for a univariate time series or transfer function given the appropriate correlation function.
LifeTables	Computes population (current) or cohort life tables based upon the observed population sizes at the middle (for population table) or the beginning (for cohort table) of some user specified age intervals.
LinearRegression	Fits a multiple linear regression model with or without an intercept.
LogNormalDistribution	Evaluates a lognormal probability density for a given set of data.
MersenneTwister	A 32-bit Mersenne Twister generator.
MersenneTwister64	A 64-bit Mersenne Twister generator.
MultiCrossCorrelation	Computes the multichannel cross-correlation function of two mutually stationary multichannel time series.
MultipleComparisons	Performs Student-Newman-Keuls multiple comparisons test.
NonlinearRegression	Fits a multivariate nonlinear regression model using least squares.
NormalDistribution	Evaluates the normal (Gaussian) probability density for a given set of data.
NormalityTest	Performs a test for normality.
NormOneSample	Computes statistics for mean and variance inferences using a sample from a normal population.
NormTwoSample	Computes statistics for mean and variance inferences using samples from two normal populations.
PartialCovariances	Class `PartialCovariances` computes the partial covariances or partial correlations from an input covariance or correlation matrix.
Pdf	Probability density functions.
PoissonDistribution	Evaluates a Poisson probability density of a given set of data.
PooledCovariances	Computes a pooled variance-covariance matrix from one or more sets of observations.
ProportionalHazards	Analyzes survival and reliability data using Cox's proportional hazards model.
Random	Generate uniform and non-uniform random number distributions.
RandomSamples	Generates a simple pseudorandom sample from a finite population, a sample of indices, or a permutation of an array of indices.
Ranks	Compute the ranks, normal scores, or exponential scores for a vector of observations.
RegressorsForGLM	Generates regressors for a general linear model.
SelectionRegression	Selects the best multiple linear regression models.
SignTest	Performs a sign test.
Sort	A collection of sorting functions.
StepwiseRegression	Builds multiple linear regression models using forward selection, backward selection, or stepwise selection.
Summary	Computes basic univariate statistics.
TableMultiWay	Tallies observations into a multi-way frequency table.
TableOneWay	Class `TableOneWay` calculates a frequency table for a data array.
TableTwoWay	Class `TableTwoWay` calculates a two-dimensional frequency table for a data array based upon two variables.
TimeSeries	A specialized class for time series data and analysis.
TimeSeriesOperations	A class of operations and methods for objects of class TimeSeries.
UserBasisRegression	Fits a linear function of the form $y = c_0 + c_1 f_1 (x) + c_2 f_2 (x) + \cdots + c_k f_k (x) + \varepsilon$ , where $f_1 (x),f_2 (x), \cdots ,f_k (x)$ are the user basis functions $f_i (x)$ evaluated at index values $i = 1,2, \ldots ,k,c_0$ is the intercept, $c_1 ,c_2 , \cdots ,c_k$ are the coefficients associated with the basis functions, and is the random error associated with y.
VectorAutoregression	Performs vector autoregression for a multivariate time series.
WelchsTTest	Performs Welch's t-test for testing the difference in means between two normal populations with unequal variances.
WilcoxonRankSum	Performs a Wilcoxon rank sum test.

Enum Summary
Enum	Description
ExtendedGARCH.Solver	An enumeration of the types of solvers available to the estimation procedure.
FactorAnalysis.ScoreMethod	Indicates which method is used for computing the factor score coefficients.
TimeSeriesOperations.CombineMethod	Public enum of methods for combining synchronous time series values.
TimeSeriesOperations.MergeRule	Public enum of merge rules that defines how two time series should be merged.
WelchsTTest.Hypothesis	The form of the alternate hypothesis.

Exception Summary
Exception	Description
ARAutoUnivariate.TriangularMatrixSingularException	The input triangular matrix is singular.
ARMA.IllConditionedException	The problem is ill-conditioned.
ARMA.IncreaseErrRelException	The bound for the relative error is too small.
ARMA.MatrixSingularException	The input matrix is singular.
ARMA.NewInitialGuessException	The iteration has not made good progress.
ARMA.ResidualsTooLargeException	The residuals have become too large in one step of the Least Squares estimation of the ARMA coefficients.
ARMA.TooManyCallsException	The number of calls to the function has exceeded the maximum number of iterations times the number of moving average (MA) parameters + 1.
ARMA.TooManyFcnEvalException	Maximum number of function evaluations exceeded.
ARMA.TooManyITNException	Maximum number of iterations exceeded.
ARMA.TooManyJacobianEvalException	Maximum number of Jacobian evaluations exceeded.
ARMAMaxLikelihood.NonInvertibleException	The solution is noninvertible.
ARMAMaxLikelihood.NonStationaryException	The solution is nonstationary.
AutoARIMA.NoAcceptableModelFoundException	No appropriate ARIMA model could be found.
AutoCorrelation.NonPosVariancesException	The problem is ill-conditioned.
CategoricalGenLinModel.ClassificationVariableException	The ClassificationVariable vector has not been initialized.
CategoricalGenLinModel.ClassificationVariableLimitException	The Classification Variable limit set by the user through `setUpperBound` has been exceeded.
CategoricalGenLinModel.ClassificationVariableValueException	The number of distinct values for each Classification Variable must be greater than 1.
CategoricalGenLinModel.DeleteObservationsException	The number of observations to be deleted (set by `setObservationMax`) has grown too large.
CategoricalGenLinModel.RankDeficientException	The model has been determined to be rank deficient.
ChiSquaredTest.DidNotConvergeException	The iteration did not converge
ChiSquaredTest.NoObservationsException	There are no observations.
ChiSquaredTest.NotCDFException	The function is not a Cumulative Distribution Function (CDF).
ClusterKMeans.ClusterNoPointsException	There is a cluster with no points
ClusterKMeans.NoConvergenceException	Convergence did not occur within the maximum number of iterations.
ClusterKMeans.NonnegativeFreqException	Deprecated No longer used, replaced with an `IllegalArgumentException`.
ClusterKMeans.NonnegativeWeightException	Deprecated No longer used, replaced with an `IllegalArgumentException`.
Covariances.DiffObsDeletedException	Deprecated
Covariances.MoreObsDelThanEnteredException	Deprecated
Covariances.NonnegativeFreqException	Frequencies must be nonnegative.
Covariances.NonnegativeWeightException	Weights must be nonnegative.
Covariances.TooManyObsDeletedException	Deprecated
CrossCorrelation.NonPosVariancesException	The problem is ill-conditioned.
DiscriminantAnalysis.CovarianceSingularException	The variance-covariance matrix is singular.
DiscriminantAnalysis.EmptyGroupException	There are no observations in a group.
DiscriminantAnalysis.SumOfWeightsNegException	The sum of the weights have become negative.
Dissimilarities.NoPositiveVarianceException	No variable has positive variance.
Dissimilarities.ScaleFactorZeroException	The computations cannot continue because a scale factor is zero.
Dissimilarities.ZeroNormException	The computations cannot continue because the Euclidean norm of the column is equal to zero.
EmpiricalQuantiles.ScaleFactorZeroException	The computations cannot continue because a scale factor is zero.
FactorAnalysis.BadVarianceException	Bad variance error.
FactorAnalysis.EigenvalueException	Eigenvalue error.
FactorAnalysis.NonPositiveEigenvalueException	Non positive eigenvalue error.
FactorAnalysis.NotPositiveDefiniteException	Matrix not positive definite.
FactorAnalysis.NotPositiveSemiDefiniteException	Covariance matrix not positive semi-definite.
FactorAnalysis.NotSemiDefiniteException	Hessian matrix not semi-definite.
FactorAnalysis.RankException	Rank of covariance matrix error.
FactorAnalysis.SingularException	Covariance matrix singular error.
GARCH.ConstrInconsistentException	The equality constraints are inconsistent.
GARCH.EqConstrInconsistentException	The equality constraints and the bounds on the variables are found to be inconsistent.
GARCH.NoVectorXException	No vector X satisfies all of the constraints.
GARCH.TooManyIterationsException	Number of function evaluations exceeded 1000.
GARCH.VarsDeterminedException	The variables are determined by the equality constraints.
InverseCdf.DidNotConvergeException	The iteration did not converge
MultiCrossCorrelation.NonPosVariancesException	The problem is ill-conditioned.
NonlinearRegression.NegativeFreqException	A negative frequency was encountered.
NonlinearRegression.NegativeWeightException	A negative weight was encountered.
NonlinearRegression.TooManyIterationsException	The number of iterations has exceeded the maximum allowed.
NormalityTest.NoVariationInputException	There is no variation in the input data.
PartialCovariances.InvalidMatrixException	Exception thrown if a computed correlation is greater than one for some pair of variables.
PartialCovariances.InvalidPartialCorrelationException	Exception thrown if a computed partial correlation is greater than one for some pair of variables.
Pdf.AltSeriesAccuracyLossException	The magnitude of alternating series sum is too small relative to the sum of positive terms to permit a reliable accuracy.
ProportionalHazards.ClassificationVariableLimitException	The Classification Variable limit set by the user through `setUpperBound` has been exceeded.
SelectionRegression.NoVariablesException	No Variables can enter the model.
StepwiseRegression.CyclingIsOccurringException	Cycling is occurring.
StepwiseRegression.NoVariablesEnteredException	No Variables can enter the model.

Package com.imsl.stat Description

Statistical methods. The IMSL stats package includes classes for descriptive statistics, hypothesis testing, multivariate analysis, time series analysis, regression analysis, parameter estimation, and more.

Design of experiments

The classes ANOVA, ANOVAFactorial, ANCOVA, and MultipleComparisons are for commonly used experimental designs. Typically, responses are stored in the input vector y in a pattern that takes advantage of the balanced design structure. Consequently, the full set of model subscripts is not needed to identify each response. The classes assume the usual pattern, which requires that the last model subscript change most rapidly, followed by the model subscript next in line, and so forth, with the first subscript changing at the slowest rate. This pattern is referred to as lexicographical ordering.

The ANOVA class allows missing responses if confidence interval information is not requested. Double.NaN (NaN = Not a Number) is the missing value code used by these classes. Any element of y that is missing must be set to NaN. Other classes do not allow missing responses because the classes generally deal with balanced designs.

Basic Statistical Methods and Tests

The classes for the computations of basic statistics include Summary, Covariances, PartialCovariances, PooledCovariances, Sort, Ranks, and EmpiricalQuantiles. They generally have relatively simple arguments and most of them will allow for missing values. Missing value codes can be set by using Double.NaN.

The classes NormOneSample, NormTwoSample, WelchsTTest, ChiSquaredTest, WilcoxonRankSum perform statistical tests and return a "p-value" for the test. The p-value is the probability of observing data that would yield a test statistic as extreme or more extreme under the assumption of the null hypothesis. Hence, a small p-value is evidence for the rejection of the null hypothesis.

Classes for tabulation include TableOneWay, TableTwoWay, and TableMultiWay.

Categorical and Discrete Data Analysis

The ContingencyTable class computes many statistics of interest in a two-way table. Statistics computed by this routine include the usual chi-squared statistics, measures of association, Kappa, and many others.

The CategoricalGenLinModel class is concerned with generalized linear models in discrete data. This routine may be used to compute estimates and associated statistics in probit, logistic, minimum extreme value, Poisson, negative binomial (with known number of successes), and logarithmic models. Classification variables as well as weights, frequencies, and additive constants may be used so that quite general linear models can be fit. Residuals, a measure of influence, the coefficient estimates, and other statistics are returned for each model fit. When infinite parameter estimates are required, extended maximum likelihood estimation may be used. Log-linear models may be fit through the use of Poisson regression models. Results from Poisson regression models involving structural and sampling zeros will be identical to the results obtained from the log-linear model but will be fit by a quasi-Newton algorithm rather than through iterative proportional fitting.

Tests of Goodness of Fit

Goodness-of-fit is a term that refers to how well an hypothesized distribution fits a set of data (see Conover, 1980). The methods for goodness-of-fit include ChiSquaredTest, NormalityTest, KolmogorovOneSample, and KolmogorovTwoSample.

The chi-squared goodness-of-fit test ChiSquaredTest may be used with discrete as well as continuous distributions.

The Kolmogorov-Smirnov methods KolmogorovOneSample and KolmogorovTwoSample compute exact probabilities in small to moderate sample sizes.

The Kolmogorov-Smirnov and chi-squared goodness-of-fit test routines allow for missing values (NaN, not a number) in the input data.

Multivariate Analysis

Multivariate analysis is concerned with the relationships among and between multiple variables.

Cluster Analysis

Clustering deals with the problem of segmenting or organizing a set of individual objects into groups or "clusters" of similar objects, so that objects within a cluster are more similar to each other than they are to objects in other clusters. In practice, objects are virtually anything for which attributes or features can be measured and quantified into a data matrix X. Typically a set of clusters is constructed from a training data set. Then the clusters are used to predict the class label for a new object.

The library includes three classes that perform clustering and classification: ClusterKMeans, ClusterKNN, and ClusterHierarchical.

ClusterKMeans performs a K-means cluster analysis. The K-means method attempts to find a clustering that minimizes the within-cluster sum-of-squares. In this method of clustering the data matrix X is grouped so that each observation (row in X) is assigned to one of a fixed number, K, of clusters. The sum of the squared difference of each observation about the mean of its assigned cluster is used as the criterion for assignment. In the basic algorithm, observations are transferred from one cluster or another when doing so decreases the within-cluster sum-of-squared differences. When no transfer occurs in a pass through the entire data set, the algorithm stops. The clustering can be evaluated by methods described in "Basic Statistics," or others throughout the library. Often, K-means clustering with more than one value of K is performed, and the value of K that best fits the data is used.

ClusterKNN performs k-Nearest Neighbor, or KNN, cluster analysis. For a new object, KNN finds a cluster consisting of the object's K closest neighbors, where "closest" is in the sense of a specified distance metric, such as the L2 norm or the L1 norm. The new object then receives the class label that the majority of the objects in its neighborhood have (majority vote). The algorithm works with a training set. For each observation in the data X (representing a new object), the distance to each object in a training set is calculated. The distances are sorted, and the smallest K distances are noted. The objects with the K smallest distances form the cluster. Note that unlike K-Means, KNN does not provide a clustering of the data set. In a sense KNN forms clusters "on the fly" for the purpose of classification.

ClusterHierarchical performs a variant of hierarchical cluster analysis. ClusterHierarchical implements a bottom-up, or agglomerative approach. Starting with every individual object in a data set as a cluster, the two clusters (objects) that are closest in the sense of some distance metric are then combined into a new cluster. With the new set of n - 1 clusters, the process is repeated, so that at the h-th stage, there are n - h clusters. The clustering continues until there is only one cluster. Each level represents a different clustering of the data that can be evaluated using fit statistics and other metrics.

Principal Components and Factor Analysis

The idea in principal components is to find a small number of linear combinations of the original variables that maximize the variance accounted for in the original data. This amounts to an eigensystem analysis of the covariance (or correlation) matrix. The class FactorAnalysis performs either principal component analysis or factor analysis. When the principal component model is used, FactorAnalysis computes standard errors for the eigenvalues. Correlations of the original variables with the principal component scores also are computed.

Factor analysis and principal component analysis, while quite different in assumptions, often serve the same ends. Unlike principal components in which linear combinations yielding the highest possible variances are obtained, factor analysis generally obtains linear combinations of the observed variables according to a model relating the observed variable to hypothesized underlying factors, plus a random error term called the unique error or uniqueness. In factor analysis, the unique errors associated with each variable are usually assumed to be independent of the factors. In addition, in the common factor model, the unique errors are assumed to be mutually independent. The factor analysis model is expressed in the following equation: $x - \mu = \Lambda f + e$ where $x$ is the $p$ vector of observed values, $\mu$ is the $p$ vector of variable means, $\Lambda$ is the $p \times k$ matrix of factor loadings, $f$ is the $k$ vector of hypothesized underlying random factors, $e$ is the $p$ vector of hypothesized unique random errors, $p$ is the number of variables in the observed variables, and $k$ is the number of factors.

Because much of the computation in factor analysis was originally done by hand or was expensive on early computers, quick (but dirty) algorithms that made the calculations possible were developed. One result is the many factor extraction methods available today. Generally speaking, in the exploratory or model building phase of a factor analysis, a method of factor extraction that is not computationally intensive (such as principal components, principal factor, or image analysis) is used. If desired, a computationally intensive method is then used to obtain the final factors.

Discriminant Analysis

Discriminant analysis involves fitting a mathematical "boundary" between groups for the purpose of classifying new observations into those groups. The class DiscriminantAnalysis performs linear or quadratic discrimination together with a number of approaches for classifying new observations, including reclassification, leave-out-one, and split-sample. The data consists of multivariate observations along with known memberships in a fixed number groups. The discriminant function is then fit to the known data. The analysis can be executed in an online mode, meaning that one or more observations can be added to the rule during each invocation of DiscriminantAnalysis.update(double[][]). One or more observations can also be removed from the state of the model, using DiscriminantAnalysis.downdate(double[][], int[]).

Nonparametric Statistics

Nonparametric statistics are derived without any assumptions about the underlying probability distribution. Examples include measures of location and scale (see "Basic Statistics"), descriptive measures in a contingency table (see "Categorical and Discrete Data Analysis") and statistics used in goodness of fit tests (see "Tests of Goodness of Fit"). In addition, the classes SignTest and WilcoxonRankSum perform nonparametric tests comparing two populations.

Probability Distribution Functions and Inverses

The classes Pdf and Cdf provide static methods for several continuous and discrete probability distributions.

The distribution function for the (real, single-valued) random variable X is the function F defined for all real x by

$F(x) = {\rm {Prob}}(X \leq x)$

where $\rm Prob(\cdot)$ denotes the probability of an event. The distribution function is often called the cumulative distribution function (CDF).

For distributions with finite ranges, such as the beta distribution, the CDF is $0$ for values less than the left endpoint and $1$ for values greater than the right endpoint. The methods in the Cdf class return the correct values for the distribution functions when values outside of the range of the random variable are input, but warning error conditions are set in these cases.

Discrete Distributions

For discrete distributions, the function giving the probability that the random variable takes on specific values is called the probability mass function, defined by

$p(x) = {\rm {Prob}}(X = x)$

The CDF for a discrete random variable is

$F(x) = \sum\limits_A p(k)$

where A is set such that $k \leq x$ . Since the distribution function is a step function, its inverse does not exist uniquely.

Continuous Distributions

For continuous distributions, a probability mass function, as defined above, would not be useful because the probability of any given point is $0$ . For such distributions, the useful analog is the probability density function (PDF). The integral of the PDF is the probability over the interval. If the continuous random variable X has PDF f, then

${\rm Prob}(a \le X \le b) = \int_a^b f(x)\,dx$

The relationship between the CDF and the PDF is

$F(x) = \int_{-\infty}^x f(t) \, dt$

For (absolutely) continuous distributions, the value of $F(x)$ uniquely determines $x$ within the support of the distribution. The "inverse" methods in the Cdf class compute the inverses of the distribution functions, that is, given $F(x)$ , they compute, $x$ . The inverses are defined only over the open interval $(0,1)$ .

Additional Comments

Whenever a probability close to $1.0$ results from a call to a distribution function or is to be input to an inverse function, it is often impossible to achieve good accuracy because of the nature of the representation of numeric values. In this case, it may be better to work with the complementary distribution function (one minus the distribution function). If the distribution is symmetric about some point (as the normal distribution, for example) or is reflective about some point (as the beta distribution, for example), the complementary distribution function has a simple relationship with the distribution function. For example, to evaluate the standard normal distribution at $4.0$ , using the com.imsl.stat method in the Cdf class directly, the result to six places is $0.999968$ . Only two of those digits are really useful, however. A more useful result may be $1.000000$ minus this value, which can be obtained to six places as $3.16712\text{e-}5$ by evaluating normal at $-4.0$ . For the normal distribution, the two values are related by $\Phi(x) = 1 - \Phi(-x)$ , where $\Phi (\cdot)$ is the normal distribution function. Another example is the beta distribution with parameters $2$ and $10$ . This distribution is skewed to the right, so evaluating beta at $0.7$ , the value $0.999953$ is obtained. A more precise result is obtained by evaluating beta with parameters $10$ and $2$ at $0.3$ . This yields $4.72392\text{e-}5$ .

Many of the algorithms used by the classes in this section are discussed by Abramowitz and Stegun (1964). The algorithms make use of various expansions and recursive relationships and often use different methods in different regions.

Cumulative distribution functions are defined for all real arguments. However, if the input to one of the distribution functions in this section is outside the range of the random variable, an error is issued.

Definitions and discussions of the basic terms can be found in Johnson and Kotz (1969, 1970a, 1970b). These are also good references for the specific distributions.

Many of the methods for probability distributions are written for standard forms of statistical distributions. Hence, the number of parameters for any given distribution may be fewer than the number often associated with the distribution. Also, the methods relating to the normal distribution, Cdf.normal(double) and Cdf.inverseNormal(double), are for a normal distribution with mean equal to zero and variance equal to one. For other means and variances, it is very easy for the user to standardize the variables by subtracting the mean and dividing by the square root of the variance.

Random Number Generation

This section describes classes and methods for the generation of random numbers that are useful for applications in simulation studies.

In the following discussions, the phrases random numbers, random deviates, deviates, and variates are used interchangeably. The phrase pseudorandom is sometimes used to emphasize that the numbers generated are really not random since they result from a deterministic process. The usefulness of pseudorandom numbers is derived from the similarity, in a statistical sense, of samples of the pseudorandom numbers to samples of observations from the specified distributions. In short, while the pseudorandom numbers are completely deterministic and repeatable, they simulate the realizations of independent and identically distributed random variables.

Class Random extends Random. It adds the ability to generate random numbers with a number of different non-uniform distributions. The non-uniform random number generators use a uniform random number generator. The uniform random number generator in java.util.Random can be used. Random adds a linear congruential generator. It is also possible to use another uniform generator as long as it implements the Random.BaseGenerator interface.

The MersenneTwister and MersenneTwister64 uniform random number generators generate random number series with very long periods. They both implement the Random.BaseGenerator interface and so can be used as the base uniform generator in com.imsl.stat.Random.

The class FaureSequence generates the low-discrepancy Faure sequence. This is also called a quasi-random generator. The sequence is a series of points in n-dimensions, which are close to being as equally spaced as possible. Low-discrepancy refers to the difference from actually being as equally spaced as possible.

The FaureSequence implements the RandomSequence interface. This interface defines a sequence of points in n-dimensions. It is used by the class HyperRectangleQuadrature to evaluate n-dimensional integrals.

Regression Analysis

The classes LinearRegression and RegressorsForGLM contain methods for linear regression and the multivariate general linear model. The class NonlinearRegression is available for nonlinear regression models. Methods for estimating or fitting regression models, methods for computing summary statistics and diagnostics, and methods for computing confidence intervals for individual cases are provided. There are also methods for building a model from a set of candidate variables, i.e., variable selection.

Simple and Multiple Linear Regression

The simple linear regression model is

$y_i=\beta_0+\beta_1 x_i+\varepsilon_i$

for $i=1,2,\ldots ,n$ . The $y_i$ 's represent the observed values of the response or dependent variable, the $x_i$ 's are the settings of the independent (explanatory) variable, $\beta_0$ and $\beta_1$ are the intercept and slope parameters (respectively) and the $\varepsilon_i$ 's are independently distributed normal errors, each with mean 0 and variance $\sigma^2$ .

The multiple linear regression model is

$y_i=\beta_0+\beta_1 x_{i1}+\beta_2 x_{i2}+\ldots+ \beta_k x_{ik}+\varepsilon_i$

for $i=1,2,\ldots,n$ . The $y_i$ 's constitute the observed values of the response or dependent variable; the $x_{i1}$ 's, $x_{i2}$ 's, $\ldots,$ $x_{ik}$ 's are the settings of the k independent (explanatory) variables; $\beta_0,\beta_1,\ldots,\beta_k$ are the regression coefficients; and the $\varepsilon_i$ 's are independently distributed normal errors, each with mean 0 and variance $\sigma^2$ .

The class LinearRegression fits both the simple and multiple linear regression models using a fast Givens transformation and includes an option for excluding the intercept $\beta_0$ . The responses are input in array y, and the independent variables are input in array x, where the individual cases correspond to the rows and the variables correspond to the columns.

After the model has been fitted using the LinearRegression class, member "get" methods such as LinearRegression.getCoefficientTTests() can be used to retrieve summary statistics. Predicted values, confidence intervals, and case statistics for the fitted model can be obtained from inner class LinearRegression.getCaseStatistics(double[], double).

No Intercept Model

Several classes provide the option for excluding the intercept from a model. In most practical applications, the intercept should be included in the model. For classes that use the sums of squares and crossproducts matrix as input, the no-intercept case can be handled by using the raw sums of squares and crossproducts matrix as input in place of the corrected sums of squares and crossproducts. The raw sums of squares and crossproducts matrix can be computed as $\left({x_1,x_2,\ldots,x_k,y}\right)^T\left({x_1,x_2,\ldots,x_k,y} \right)$ .

Specification of X for the General Linear Model

Variables used in the general linear model are either continuous or classification variables. Typically, multiple regression models use continuous variables, whereas analysis of variance models use classification variables. Although the notation used to specify analysis of variance models and multiple regression models may look quite different, the models are essentially the same. The term general linear model emphasizes that a common notational scheme is used for specifying a model that may contain both continuous and classification variables.

A general linear model is specified by its effects (sources of variation). An effect is a single variable or a product of variables. In particular, an effect is composed of one of the following:

a single continuous variable
a single classification variable
several different classification variables
several continuous variables, some of which may be the same
continuous variables, some of which may be the same, and classification variables, which must be distinct

Effects of the first type are common in multiple regression models. Effects of the second type appear as main effects in analysis of variance models. Effects of the third type appear as interactions in analysis of variance models. Effects of the fourth type appear in polynomial models and response surface models as powers and crossproducts of some basic variables. Effects of the fifth type appear in one-way analysis of covariance models as regression coefficients that indicate lack of parallelism of a regression function across the groups.

The analysis of a general linear model occurs in two stages. The first stage calls class RegressorsForGLM to specify all regressors except the intercept. The second stage uses LinearRegression at which point the model will be specified as either having or not having an intercept.

Suppose the data matrix has as its first four columns two continuous variables in Columns $0$ and $1$ and two classification variables in Columns $2$ and $3$ . The data might appear as follows:

Column $0$	Column $1$	Column $2$	Column $3$
$11.23$	$1.23$	$1.0$	$5.0$
$12.12$	$2.34$	$1.0$	$4.0$
$12.34$	$1.23$	$1.0$	$4.0$
$4.34$	$2.21$	$1.0$	$5.0$
$5.67$	$4.31$	$2.0$	$4.0$
$4.12$	$5.34$	$2.0$	$1.0$
$4.89$	$9.31$	$2.0$	$1.0$
$9.12$	$3.71$	$2.0$	$1.0$

Each distinct value of a classification variable determines a level. The classification variable in Column $2$ has two levels. The classification variable in Column $3$ has three levels. (Integer values are recommended, but not required, for values of the classification variables. The values of the classification variables corresponding to the same level must be identical.) Some examples of regression functions and their specifications are as follows:

	Intercept	Class Columns	Effects
$\beta_0+\beta_1 x_1$	`true`	{}	$\{ \{0\} \}$
$\beta_0+\beta_1 x_1 + \beta_2 x_1^2$	`true`	{}	$\{\{0\}, \{0,0\}\}$
$\mu+\alpha_i$	`true`	$\{2\}$	${ \{2\} }$
$\mu+\alpha_i+\beta_j+\gamma_{ij}$	`true`	$\{2, 3\}$	$\{\{2\}, \{3\}, \{2, 3\}\}$
$\mu_{ij}$	`false`	${2, 3}$	${ \{2, 3\} }$
$\beta_0+\beta_1 x_1 + \beta_2 x_2 + \beta_3 x_1 x_2$	`true`	$\{\}$	${\{0\}, \{1\}, \{0, 1\}}$

Variable Selection

Variable selection can be performed by SelectionRegression, which computes all best-subset regressions, or by StepwiseRegression, which computes stepwise regression. The method used by SelectionRegression is generally preferred over that used by StepwiseRegression because SelectionRegression implicitly examines all possible models in the search for a model that optimizes some criterion while stepwise does not examine all possible models. However, the computer time and memory requirements for SelectionRegression can be much greater than that for StepwiseRegression when the number of candidate variables is large.

Nonlinear Regression

The nonlinear regression model is

$y_i=f(x_i;\theta)+\varepsilon_i$

for $i=1,2,\ldots,n$ . The $y_i$ 's are the values of the response or dependent variable, the $x_i$ 's are the known vectors of values of the independent (explanatory) variables, f is a known function of an unknown regression parameter vector $\theta$ , and the $\varepsilon_i$ 's are independently distributed normal errors each with mean $0$ and variance $\sigma^2$ .

Class NonlinearRegression performs the least-squares fit to the data for this model.

Weighted Least Squares

Classes throughout the section generally allow weights to be assigned to the observations. A weight argument is used throughout to specify the weighting for particular rows of X.

Computations that relate to statistical inference-e.g., t tests, F tests, and confidence intervals-are based on the multiple regression model except that the variance of $\varepsilon_i$ is assumed to equal $\sigma^2$ times the reciprocal of the corresponding weight.

If a single row of the data matrix corresponds to $n_i$ observations, the vector frequencies can be used to specify the frequency for each row of X. Degrees of freedom for error are affected by frequencies but are unaffected by weights.

Regression summary Statistics

Methods LinearRegression.getANOVA(), LinearRegression.getCoefficientTTests(), NonlinearRegression.getR() and StepwiseRegression.getCoefficientVIF() can be used to compute statistics related to a regression for each of the dependent variables. The summary statistics include the model analysis of variance table, sequential sums of squares and F-statistics, coefficient estimates, estimated standard errors, t-statistics, variance inflation factors and estimated variance-covariance matrix of the estimated regression coefficients.

The summary statistics are computed under the model $y=X\beta+ \varepsilon$ , where y is the $n\times 1$ vector of responses, X is the $n\times p$ matrix of regressors with rank $(X)=r$ = r, $\beta$ is the $p\times 1$ vector of regression coefficients, and $\varepsilon$ is the $n\times 1$ vector of errors whose elements are independently normally distributed with mean 0 and variance $\sigma^2 /w_i$ .

Given the results of a weighted least-squares fit of this model (with the $w_i$ 's as the weights), most of the computed summary statistics are output in an ANOVA object.

The getANOVA() methods in several of the regression classes return an ANOVA object. Summary statistics can be retrieved via specific "get" methods or the ANOVA.getArray() method. This returns a one-dimensional array. In StepwiseRegression, ANOVA.getArray() returns Double.NaN for the last two elements of the array because they cannot be computed from the input. The array contains statistics related to the analysis of variance. The sources of variation examined are the regression, error, and total. The first 10 elements of the ANOVA.getArray() and the notation frequently used for these is described in the following table (here, AOV = ANOVA.getArray())

: Model Analysis of Variance Table

Variation Src.	Deg. of Freedom	Sum of Squares	Mean Square	F	p-value
Regression	`DFR=AOV[0]`	`SSR=AOV[3]`	`MSR=AOV[6]`	`AOV[8]`	`AOV[9]`
Error	`DFE=AOV[1]`	`SSE=AOV[4]`	$s^2$ =`AOV[7]`
Total	`DFT=AOV[2]`	`SST=AOV[5]`

If the model has an intercept (default), the total sum of squares is the sum of squares of the deviations of $y_i$ from its (weighted) mean $\bar {y}$ - the so-called corrected total sum of squares, denoted by the following:

$\textrm{SST}=\sum\limits_{i=1}^n w_i(y_i-\bar y)^2$

If the model does not have an intercept (hasIntercept = false), the total sum of squares is the sum of squares of $y_i$ - the so-called uncorrected total sum of squares, denoted by the following:

$\textrm{SST}=\sum\limits_{i=1}^n w_i y_i^2$

The error sum of squares is given as follows:

$\textrm{SSE}=\sum\limits_{i=1}^n{w_i}\left(y_i- \hat{y_i}\right)^2$

where

$\hat{y_i} = x_i\hat{\beta}$ is the value estimated by the fitted regression model.

The error degrees of freedom is defined by $\textrm{DFE}=n-r$ .

The estimate of $\sigma^2$ is given by $s^2= \textrm{SSE}/\textrm{DFE}$ , which is the error mean square.

The computed F statistic for the null hypothesis, $H_0: \beta_1=\beta_2=\ldots\beta_p=0$ , versus the alternative that at least one coefficient is nonzero is given by $F=\textrm{MSR}/s^2$ . The p-value associated with the test is the probability of an F larger than that computed under the assumption of the model and the null hypothesis. A small p-value (less than 0.05) is customarily used to indicate there is sufficient evidence from the data to reject the null hypothesis.

The remaining five elements in AOV frequently are displayed together with the actual analysis of variance table. The quantities R-squared ( $R^2$ = AOV[10]) and adjusted R-squared ( $R_a^2$ = AOV[11]) are expressed as a percentage and are defined as follows:

$R^2=100\left(\textrm{SSR}/\textrm{SST}\right)=100 \left(1-\textrm{SSE}/\textrm{SST}\right)$

$R_a^2=100\ \textrm{max}\left\{0,1-\frac{s^2}{\textrm{ SST}/ \textrm{DFT}}\right\}$

The square root of $s^2 = s$ = AOV[12] is frequently referred to as the estimated standard deviation of the model error.

The overall mean of the responses ${\bar y}$ is output in AOV[13].

The coefficient of variation ( $CV$ =AOV[14] is expressed as a percentage and defined by $\textrm{CV}=100s/\bar y$ .

LinearRegression.CoefficientTTests is a nested class within the LinearRegression and StepwiseRegression classes. The statistics (estimated standard error, t-statistic and p-value) associated with each coefficient can be retrieved with their associated "get" methods. And finally, getR() returns the variance-covariance matrix of the estimated coefficients.

Diagnostics for Individual Cases

Diagnostics for individual cases (observations) are computed by the LinearRegression.CaseStatistics class for linear regression.

Statistics computed include predicted values, confidence intervals, and diagnostics for detecting outliers and cases that greatly influence the fitted regression.

The diagnostics are computed under the model $y=X\beta+\varepsilon$ , where y is the $n\times 1$ vector of responses, X is the $n\times p$ matrix of regressors with rank(X) = r, $\beta$ is the $p\times 1$ vector of regression coefficients, and $\varepsilon$ is the $n\times 1$ vector of errors whose elements are independently normally distributed with mean $0$ and variance $\sigma^2/w_i$ .

Given the results of a weighted least-squares fit of this model (with the $w_i$ 's as the weights), the following five diagnostics are computed:

leverage
standardized residual
jackknife residual
Cook's distance
DFFITS

The definition of these terms is given in the discussion that follows:

Let

$x_i$ be a column vector containing the elements of the i-th row of X. A case can be unusual either because of

$x_i$ or because of the response

$y_i$ . The leverage

$h_i$ is a measure of uniqueness of the

$x_i$ . The leverage is defined by

$h_i=[x_i^T\left({X^T WX}\right)^-x_i]w_i$

where $W=\textrm{diag}\left({w_1,w_2\ldots,w_n}\right)$ and $\left({X^T WT}\right)^-$ denotes a generalized inverse of $X^T WT$ . The average value of the $h_i$ 's is $r/n$ . Regression classes declare $x_i$ unusual if $h_i>2r/n$ . Hoaglin and Welsch (1978) call a data point highly influential (i.e., a leverage point) when this occurs.

Let $e_i$ denote the residual

$y_i-\hat y_i$

for the i-th case. The estimated variance of $e_i$ is $(1-h_i)s^2/w_i$ , where $s^2$ is the residual mean square from the fitted regression. The i-th standardized residual (also called the internally studentized residual) is by definition

$r_i=e_i\sqrt{\frac{w_i}{s^2\left({1-h_i}\right)}}$

and $r_i$ follows an approximate standard normal distribution in large samples.

The i-th jackknife residual or deleted residual involves the difference between $y_i$ and its predicted value, based on the data set in which the i-th case is deleted. This difference equals $e_i/\left({1-h_i}\right)$ . The jackknife residual is obtained by standardizing this difference. The residual mean square for the regression in which the i-th case is deleted is as follows:

$s_i^2=\frac{{\left({n-r}\right)s^2-w_i e_i^2/ \left({1-h_i}\right)}}{{n-r-1}}$

The jackknife residual is defined as

$t_i=e_i\sqrt{\frac{w_i}{s_i^2\left({1-h_i} \right)}}$

and $t_i$ follows a $t-$ distribution with $n-r\times 1$ degrees of freedom.

Cook's distance for the i-th case is a measure of how much an individual case affects the estimated regression coefficients. It is given as follows:

$D_i=\frac{w_i h_i e_i^2}{rs^2\left({1-h_i}\right) ^2}$

Weisberg (1985) states that if $D_i$ exceeds the 50-th percentile of the F(r, n - r) distribution, it should be considered large. (This value is about 1. This statistic does not have an F distribution.)

DFFITS, like Cook's distance, is also a measure of influence. For the i-th case, DFFITS is computed by the formula below.

$\text{DFFITS}_i=e_i\sqrt{\frac{w_i h_i}{s_i^2 \left({1-h_i}\right)^2}}$

Hoaglin and Welsch (1978) suggest that DFFITS greater than

$2\sqrt{r/n}$

is large.

Transformations

Transformations of the independent variables are sometimes useful in order to satisfy the regression model. The inclusion of squares and crossproducts of the variables

$\left({x_1,x_2,x_1^2,x_2^2,x_1 x_2}\right)$

is often needed. Logarithms of the independent variables are used also. (See Draper and Smith 1981, pp. 218-222; Box and Tidwell 1962; Atkinson 1985, pp. 177-180; Cook and Weisberg 1982, pp. 78-86.)

When the responses are described by a nonlinear function of the parameters, a transformation of the model equation often can be selected so that the transformed model is linear in the regression parameters. For example, by taking natural logarithms on both sides of the equation, the exponential model

$y=e^{\beta_0+\beta_1 x_1}\varepsilon$

can be transformed to a model that satisfies the linear regression model provided the $\varepsilon_i$ 's have a log-normal distribution (Draper and Smith, pp. 222-225).

When the responses are nonnormal and their distribution is known, a transformation of the responses can often be selected so that the transformed responses closely satisfy the regression model assumptions. The square-root transformation for counts with a Poisson distribution and the arc-sine transformation for binomial proportions are common examples (Snedecor and Cochran 1967, pp. 325-330; Draper and Smith, pp. 237-239).

Missing Values

NaN (Not a Number) is the missing value code used by the regression classes. Use field Double.NaN to retrieve NaN. Any element of the data matrix that is missing must be set to Double.NaN. In fitting regression models, any observation containing NaN for the independent, dependent, weight, or frequency variables is omitted from the computation of the regression parameters.

Survival and Reliability Analysis

The classes described in this section have primary application in the areas of reliability and life testing, but they may find application in any situation in which analysis of binomial events over time is of interest. Kalbfleisch and Prentice (1980), Elandt-Johnson and Johnson (1980), Lee (1980), Gross and Clark (1975), Lawless (1982), and Chiang (1968) and Tanner and Wong (1984) are references for discussing the models and methods described in this section. The class KaplanMeierEstimates produces Kaplan-Meier (product-limit) estimates of the survival distribution in a single population. The class ProportionalHazards computes the parameter estimates in a proportional hazards model. The class LifeTables computes and (optionally) prints an actuarial table based either upon a cohort followed over time or a cross-section of a population.

Time Series and Forecasting

The classes in this section allow users to filter and analyze time series data using autoregressive, ARIMA, GARCH and state-space models. Some are designed for automatic model selection, estimation and forecasting using algorithms based upon minimizing AIC (Akaike's Information Criterion).

General Methodology

Model Identification and Data Filtering

Most time series classes in the library assume the data are stationary and collected at equally spaced times with no missing observations in the series. In order to prepare a series for analysis, any missing values must be replaced with estimates. The class ARMAEstimateMissing can be used to automatically estimate missing values using one of four, user-selected methods, provided the largest time gap with missing values has no more than 4 missing values. An exception is the class AutoARIMA, which accepts a data series with missing values.

In addition to estimating missing values, if a series is nonstationary or contains seasonal variation, its values should be filtered before fitting a model and preparing forecasts. Class ARSeasonalFit evaluates seasonal adjustments prior to modeling the series. Users specify a range of adjustments.ARSeasonalFit evaluates each adjustment to identify the one that minimizes the AIC. If a user already knows the series needs to be filtered using differences between consecutive values, the class Difference can be used to filter a series into an equivalent series of differences.

There are several classes available for transforming a time series into its autocorrelation matrix: AutoCorrelation,CrossCorrelation and MultiCrossCorrelation. Class AutoCorrelation computes the autocorrelation and partial autocorrelation matrices from a time series or its filtered values. Box and Jenkins (1976) describe how to identify the structure of an ARIMA model using the autocorrelation and partial autocorrelation matrices.

Classes CrossCorrelation and MultiCrossCorrelation are used to calculate the cross-correlation matrix for two univariate or multivariate time series, respectively.

Model Estimation and Forecasting

There are several classes useful for modeling stationary, equally spaced time series - ARMA, ARMAMaxLikelihood, GARCH, and KalmanFilter. Class ARMA can be used to fit autoregressive, moving-average and ARIMA (autoregressive, integrated moving average) models. Rather than passing the original series to this class, users are required to construct this class using the autocorrelation matrix of the series. Class AutoCorrelation is useful for computing this matrix prior to constructing the ARMA class.

The other classes require users to construct the class using the original series or its filtered values. Class ARMAMaxLikelihood estimates the maximum likelihood estimates of the ARMA parameters and prepares forecasts from these values. Class ARAutoUnivariate selects the best autoregressive model for an equally spaced, stationary time series using AIC.Class GARCH estimates the parameters of a GARCH model and the KalmanFilter class estimates the parameters for a state-space model using Kalman filtering and calculates one-step ahead forecasts.

Several classes contain methods for obtaining forecasts from an ARIMA model. Class KalmanFilter can be used to obtain one-step ahead forecasts for a state-space model.

ARIMA Model (Autoregressive Integrated Moving Average)

A small, yet comprehensive, class of stationary time-series models consists of the nonseasonal ARMA processes defined by

$\phi \left( B \right)\left( {W_t - \mu } \right) = \theta \left( B \right)A_t , \,\,\,\,\,\,\,\, t \in Z$

where $Z = {\dots, -2, -1, 0, 1, 2, \dots}$ denotes the set of integers, B is the backward shift operator defined by $B^kW_t = W_{t-k}, \mu$ is the mean of $W_t$ , and the following equations are true:

$\phi(B) = 1 - \phi_1 B - \phi_2B^2 - \dots - \phi_pB^p, p \geq 0$

$\theta(B) = 1 -\theta_1 B- \theta_2B^2 - \dots - \theta_qB^q, q \geq 0$

The model is of order (p, q) and is referred to as an ARMA (p, q) model.

An equivalent version of the ARMA (p, q) model is given by

$\phi(B) W_t = \theta_0 + \theta(B)A_i, \,\,\,\,\,\, t \in Z$

where $\theta_0$ is an overall constant defined by the following:

$\theta _0 = \mu \left( {1 - \sum\limits_{i = 1}^p {\phi _i } } \right)$

See Box and Jenkins (1976, pp. 92-93) for a discussion of the meaning and usefulness of the overall constant.

If the "raw" data, $\left\{ {Z_t } \right\}$ , are homogeneous and nonstationary, then differencing using the Difference class induces stationarity, and the model is called ARIMA (AutoRegressive Integrated Moving Average). Parameter estimation is performed on the stationary time series $W_t = \Delta^dZ_t$ , where $\Delta^d = (1 - B)^d$ is the backward difference operator with period 1 and order $d, {\rm d} \gt 0$ . For the ARIMA model, see AutoARIMA class for parameter estimation and model selection.

There are two classes for estimating the parameters in an ARMA model and preparing forecasts: ARMA and ARMAMaxLikelihood. Class ARMA estimates model parameters using either method of moments or least squares. The method of moments is selected by default or by setting property Method to METHOD_Of_MOMENTS. Method of moment estimates are good initial values for obtaining the least-squares estimates by setting property Method to LEAST_SQUARES . Other initial estimates provided by the user can be used. The least-squares procedure can be used to compute conditional or unconditional least-squares estimates of the parameters, depending on the choice of the backcasting length. The parameter estimates from either the method of moments or least-squares procedures can be used in the forecast method. The methods for preliminary parameter estimation, least-squares parameter estimation, and forecasting follow the approach of Box and Jenkins (1976, Programs 2-4, pp. 498-509).

Class ARMAMaxLikelihood estimates model parameters using maximum likelihood. Users can tailor the maximum likelihood algorithm by setting tolerances and providing initial estimates for the parameters. By default, ARMAMaxLikelihood uses method of moment estimates to initialize the parameters. However, this can be overridden with the ARMAMaxLikelihood.setAR(double[]) method.

Exponential Smoothing Methods

Exponential smoothing approximates the value of a time series at time t with a weighted average of previous values, with weights defined in such a way that they decay exponentially over time. The weights can be determined by a smoothing parameter $\alpha$ and the relation,

${y_t} = \alpha {y_{t - 1}} + {\hat y_t}$

$\Rightarrow$

${y_t} = \sum\limits_{j = 0}^{t - 1} {{\alpha ^{t - j}}{{(1 - \alpha )}^j}} {y_{_j}} = \sum\limits_{j = 0}^{t - 1} {{w_j}} {y_{_j}}$

The parameter $\alpha$ is on the interval (0,1) and controls the rate of decay. For values close to 1, the effect decays rapidly, whereas for values close to 0, the influence of a past value persists for some time. Exponential smoothing as a forecasting procedure is widely used and largely effective for short term, mean level forecasting. With the addition of a term for linear trend and terms or factors for seasonal patterns, exponential smoothing is an intuitive procedure for many series that arise in business applications. Class HoltWintersExponentialSmoothing performs the Holt-Winters method, otherwise known as triple exponential smoothing, and allows for either an additive or a multiplicative seasonal effect.