Interface | Description |
---|---|
CdfFunction |
Public interface for the user-supplied cumulative distribution function
to be used by InverseCdf and ChiSquaredTest.
|
Distribution |
Public interface for the user-supplied distribution function.
|
ExtendedGARCH.zDistribution |
Public interface for specifying the distribution of \(z_t\).
|
NonlinearRegression.Derivative |
Public interface for the user supplied function to compute the
derivative for
NonlinearRegression . |
NonlinearRegression.Function |
Public interface for the user supplied function for
NonlinearRegression . |
ProbabilityDistribution |
Public interface for a user-supplied probability distribution.
|
Random.BaseGenerator |
Base pseudorandom number.
|
RandomSequence |
Interface implemented by generators of random or quasi-random
multidimensional sequences.
|
RegressionBasis |
Public interface for user supplied function to
UserBasisRegression object. |
TimeSeriesOperations.Function |
Public interface for the user-supplied function that defines how to
combine two synchronous time series values.
|
Class | Description |
---|---|
ANCOVA |
Analyzes a one-way classification model with covariates.
|
ANOVA |
Analysis of Variance table and related statistics.
|
ANOVAFactorial |
Analyzes a balanced factorial design with fixed effects.
|
ARAutoUnivariate |
Automatically determines the best autoregressive time series model using
Akaike's Information Criterion.
|
ARAutoUnivariate.Formatter | Deprecated
Use
IMSLFormatter instead. |
ARMA |
Computes least-square estimates of parameters for an ARMA model.
|
ARMAEstimateMissing |
Estimates missing values in a time series collected with equal spacing.
|
ARMAMaxLikelihood |
Computes maximum likelihood estimates of parameters for an ARMA model with
p and q autoregressive and moving average terms respectively.
|
ARMAOutlierIdentification |
Detects and determines outliers and simultaneously estimates the model parameters in a time
series whose underlying outlier free series follows a general seasonal or nonseasonal ARMA
model.
|
ARSeasonalFit |
Estimates the optimum seasonality parameters for a time series using an
autoregressive model, AR(p), to represent the time series.
|
AutoARIMA |
Automatically identifies time series outliers, determines parameters of a
multiplicative seasonal \(\text{ARIMA}(p,0,q)\times(0,d,0)_s
\) model and produces forecasts that incorporate the effects of
outliers whose effects persist beyond the end of the series.
|
AutoCorrelation |
Computes the sample autocorrelation function of a stationary time
series.
|
CategoricalGenLinModel |
Analyzes categorical data using logistic, probit, Poisson, and other linear
models.
|
Cdf |
Cumulative probability distribution functions.
|
ChiSquaredTest |
Chi-squared goodness-of-fit test.
|
ClusterHierarchical |
Performs a hierarchical cluster analysis from a distance matrix.
|
ClusterKMeans |
Perform a K-means (centroid) cluster analysis.
|
ClusterKNN |
Perform a k-Nearest Neighbor classification.
|
ContingencyTable |
Performs a chi-squared analysis of a two-way contingency table.
|
Covariances |
Computes the sample variance-covariance or correlation matrix.
|
CrossCorrelation |
Computes the sample cross-correlation function of two stationary time
series.
|
Difference |
Differences a seasonal or nonseasonal time series.
|
DiscriminantAnalysis |
Performs a linear or a quadratic discriminant function analysis among
several known groups.
|
Dissimilarities |
Computes a matrix of dissimilarities (or similarities) between the columns
(or rows) of a matrix.
|
EGARCH |
Implements the exponential GARCH (EGARCH) model.
|
EmpiricalQuantiles |
Computes empirical quantiles.
|
ExtendedGARCH |
Abstract class for extended GARCH models.
|
FactorAnalysis |
Performs Principal Component Analysis or Factor Analysis on a covariance or correlation matrix.
|
FaureSequence |
Generates the low-discrepancy Faure sequence.
|
GammaDistribution |
Evaluates a gamma probability density for a given set of data.
|
GARCH |
Computes estimates of the parameters of a GARCH(p,q) model.
|
HoltWintersExponentialSmoothing |
Calculates parameters and forecasts using the Holt-Winters
Multiplicative or Additive forecasting method for seasonal data.
|
InvCdf |
Inverse cumulative probability distribution functions.
|
InverseCdf |
Inverse of user-supplied cumulative distribution function.
|
KalmanFilter |
Performs Kalman filtering and evaluates the likelihood function for the
state-space model.
|
KaplanMeierECDF |
Computes the Kaplan-Meier reliability function estimates or the CDF based on
failure data that may be multi-censored.
|
KaplanMeierEstimates |
Computes Kaplan-Meier (or product-limit) estimates of survival probabilities
for a sample of failure times that possibly contain right censoring.
|
KolmogorovOneSample |
The class
KolmogorovOneSample performs a Kolmogorov-Smirnov
goodness-of-fit test in one sample. |
KolmogorovTwoSample |
Performs a Kolmogorov-Smirnov two-sample test.
|
LackOfFit |
Performs lack-of-fit test for a univariate time series or transfer function
given the appropriate correlation function.
|
LifeTables |
Computes population (current) or cohort life tables based upon the observed
population sizes at the middle (for population table) or the beginning (for
cohort table) of some user specified age intervals.
|
LinearRegression |
Fits a multiple linear regression model with or without an intercept.
|
LogNormalDistribution |
Evaluates a lognormal probability density for a given set of data.
|
MersenneTwister |
A 32-bit Mersenne Twister generator.
|
MersenneTwister64 |
A 64-bit Mersenne Twister generator.
|
MultiCrossCorrelation |
Computes the multichannel cross-correlation function of two mutually
stationary multichannel time series.
|
MultipleComparisons |
Performs Student-Newman-Keuls multiple comparisons test.
|
NonlinearRegression |
Fits a multivariate nonlinear regression model using least squares.
|
NormalDistribution |
Evaluates the normal (Gaussian) probability density for a given set of data.
|
NormalityTest |
Performs a test for normality.
|
NormOneSample |
Computes statistics for mean and variance inferences using a sample
from a normal population.
|
NormTwoSample |
Computes statistics for mean and variance inferences using samples from
two normal populations.
|
PartialCovariances |
Class
PartialCovariances computes the partial covariances or partial
correlations from an input covariance or correlation matrix. |
Probability density functions.
|
|
PoissonDistribution |
Evaluates a Poisson probability density of a given set of data.
|
PooledCovariances |
Computes a pooled variance-covariance matrix from one or more sets of
observations.
|
ProportionalHazards |
Analyzes survival and reliability data using Cox's proportional hazards model.
|
Random |
Generate uniform and non-uniform random number distributions.
|
RandomSamples |
Generates a simple pseudorandom sample from a finite population, a sample
of indices, or a permutation of an array of indices.
|
Ranks |
Compute the ranks, normal scores, or exponential scores
for a vector of observations.
|
RegressorsForGLM |
Generates regressors for a general linear model.
|
SelectionRegression |
Selects the best multiple linear regression models.
|
SignTest |
Performs a sign test.
|
Sort |
A collection of sorting functions.
|
StepwiseRegression |
Builds multiple linear regression models using forward selection, backward
selection, or stepwise selection.
|
Summary |
Computes basic univariate statistics.
|
TableMultiWay |
Tallies observations into a multi-way frequency table.
|
TableOneWay |
Class
TableOneWay calculates a frequency table for a data array. |
TableTwoWay |
Class
TableTwoWay calculates a two-dimensional frequency table for
a data array based upon two variables. |
TimeSeries |
A specialized class for time series data and analysis.
|
TimeSeriesOperations |
A class of operations and methods for objects of class TimeSeries.
|
UserBasisRegression |
Fits a linear function of the form \(y = c_0 + c_1 f_1 (x) + c_2 f_2 (x) + \cdots + c_k f_k (x) + \varepsilon\),
where \(f_1 (x),f_2 (x), \cdots ,f_k (x)\) are the user basis functions
\(f_i (x)\) evaluated at index values
\(i = 1,2, \ldots ,k,c_0 \) is the intercept, \(c_1 ,c_2 , \cdots ,c_k\)
are the coefficients associated with the basis functions, and is the random
error associated with y.
|
VectorAutoregression |
Performs vector autoregression for a multivariate time series.
|
WelchsTTest |
Performs Welch's t-test for testing the difference in means between
two normal populations with unequal variances.
|
WilcoxonRankSum |
Performs a Wilcoxon rank sum test.
|
Enum | Description |
---|---|
ExtendedGARCH.Solver |
An enumeration of the types of solvers available to the estimation
procedure.
|
FactorAnalysis.ScoreMethod |
Indicates which method is used for computing the factor score
coefficients.
|
TimeSeriesOperations.CombineMethod |
Public enum of methods for combining synchronous time series values.
|
TimeSeriesOperations.MergeRule |
Public enum of merge rules that defines how two time series should be
merged.
|
WelchsTTest.Hypothesis |
The form of the alternate hypothesis.
|
Exception | Description |
---|---|
ARAutoUnivariate.TriangularMatrixSingularException |
The input triangular matrix is singular.
|
ARMA.IllConditionedException |
The problem is ill-conditioned.
|
ARMA.IncreaseErrRelException |
The bound for the relative error is too small.
|
ARMA.MatrixSingularException |
The input matrix is singular.
|
ARMA.NewInitialGuessException |
The iteration has not made good progress.
|
ARMA.ResidualsTooLargeException |
The residuals have become too large in one step of the Least Squares
estimation of the ARMA coefficients.
|
ARMA.TooManyCallsException |
The number of calls to the function has exceeded the maximum number of
iterations times the number of moving average (MA) parameters + 1.
|
ARMA.TooManyFcnEvalException |
Maximum number of function evaluations exceeded.
|
ARMA.TooManyITNException |
Maximum number of iterations exceeded.
|
ARMA.TooManyJacobianEvalException |
Maximum number of Jacobian evaluations exceeded.
|
ARMAMaxLikelihood.NonInvertibleException |
The solution is noninvertible.
|
ARMAMaxLikelihood.NonStationaryException |
The solution is nonstationary.
|
AutoARIMA.NoAcceptableModelFoundException |
No appropriate ARIMA model could be found.
|
AutoCorrelation.NonPosVariancesException |
The problem is ill-conditioned.
|
CategoricalGenLinModel.ClassificationVariableException |
The ClassificationVariable vector has not been initialized.
|
CategoricalGenLinModel.ClassificationVariableLimitException |
The Classification Variable limit set by the user through
setUpperBound has been exceeded. |
CategoricalGenLinModel.ClassificationVariableValueException |
The number of distinct values for each Classification Variable must be
greater than 1.
|
CategoricalGenLinModel.DeleteObservationsException |
The number of observations to be deleted (set by
setObservationMax ) has grown too large. |
CategoricalGenLinModel.RankDeficientException |
The model has been determined to be rank deficient.
|
ChiSquaredTest.DidNotConvergeException |
The iteration did not converge
|
ChiSquaredTest.NoObservationsException |
There are no observations.
|
ChiSquaredTest.NotCDFException |
The function is not a Cumulative Distribution Function (CDF).
|
ClusterKMeans.ClusterNoPointsException |
There is a cluster with no points
|
ClusterKMeans.NoConvergenceException |
Convergence did not occur within the maximum number of iterations.
|
ClusterKMeans.NonnegativeFreqException | Deprecated
No longer used, replaced with an
IllegalArgumentException . |
ClusterKMeans.NonnegativeWeightException | Deprecated
No longer used, replaced with an
IllegalArgumentException . |
Covariances.DiffObsDeletedException | Deprecated |
Covariances.MoreObsDelThanEnteredException | Deprecated |
Covariances.NonnegativeFreqException |
Frequencies must be nonnegative.
|
Covariances.NonnegativeWeightException |
Weights must be nonnegative.
|
Covariances.TooManyObsDeletedException | Deprecated |
CrossCorrelation.NonPosVariancesException |
The problem is ill-conditioned.
|
DiscriminantAnalysis.CovarianceSingularException |
The variance-covariance matrix is singular.
|
DiscriminantAnalysis.EmptyGroupException |
There are no observations in a group.
|
DiscriminantAnalysis.SumOfWeightsNegException |
The sum of the weights have become negative.
|
Dissimilarities.NoPositiveVarianceException |
No variable has positive variance.
|
Dissimilarities.ScaleFactorZeroException |
The computations cannot continue because a scale factor is zero.
|
Dissimilarities.ZeroNormException |
The computations cannot continue because the Euclidean norm of the
column is equal to zero.
|
EmpiricalQuantiles.ScaleFactorZeroException |
The computations cannot continue because a scale factor is zero.
|
FactorAnalysis.BadVarianceException |
Bad variance error.
|
FactorAnalysis.EigenvalueException |
Eigenvalue error.
|
FactorAnalysis.NonPositiveEigenvalueException |
Non positive eigenvalue error.
|
FactorAnalysis.NotPositiveDefiniteException |
Matrix not positive definite.
|
FactorAnalysis.NotPositiveSemiDefiniteException |
Covariance matrix not positive semi-definite.
|
FactorAnalysis.NotSemiDefiniteException |
Hessian matrix not semi-definite.
|
FactorAnalysis.RankException |
Rank of covariance matrix error.
|
FactorAnalysis.SingularException |
Covariance matrix singular error.
|
GARCH.ConstrInconsistentException |
The equality constraints are inconsistent.
|
GARCH.EqConstrInconsistentException |
The equality constraints and the bounds on the variables are
found to be inconsistent.
|
GARCH.NoVectorXException |
No vector X satisfies all of the constraints.
|
GARCH.TooManyIterationsException |
Number of function evaluations exceeded 1000.
|
GARCH.VarsDeterminedException |
The variables are determined by the equality constraints.
|
InverseCdf.DidNotConvergeException |
The iteration did not converge
|
MultiCrossCorrelation.NonPosVariancesException |
The problem is ill-conditioned.
|
NonlinearRegression.NegativeFreqException |
A negative frequency was encountered.
|
NonlinearRegression.NegativeWeightException |
A negative weight was encountered.
|
NonlinearRegression.TooManyIterationsException |
The number of iterations has exceeded the maximum allowed.
|
NormalityTest.NoVariationInputException |
There is no variation in the input data.
|
PartialCovariances.InvalidMatrixException |
Exception thrown if a computed correlation is greater than one for some pair of variables.
|
PartialCovariances.InvalidPartialCorrelationException |
Exception thrown if a computed partial correlation is greater than one for some pair of variables.
|
Pdf.AltSeriesAccuracyLossException |
The magnitude of alternating series sum is too small relative to the sum
of positive terms to permit a reliable accuracy.
|
ProportionalHazards.ClassificationVariableLimitException |
The Classification Variable limit set by the user through
setUpperBound has been exceeded. |
SelectionRegression.NoVariablesException |
No Variables can enter the model.
|
StepwiseRegression.CyclingIsOccurringException |
Cycling is occurring.
|
StepwiseRegression.NoVariablesEnteredException |
No Variables can enter the model.
|
The classes ANOVA
, ANOVAFactorial
,
ANCOVA
, and MultipleComparisons
are for commonly used experimental designs. Typically, responses are stored
in the input vector y in a pattern that takes advantage of the
balanced design structure. Consequently, the full set of model subscripts is
not needed to identify each response. The classes assume the usual pattern,
which requires that the last model subscript change most rapidly, followed by
the model subscript next in line, and so forth, with the first subscript
changing at the slowest rate. This pattern is referred to as
lexicographical ordering.
The ANOVA
class allows missing responses if confidence
interval information is not requested. Double.NaN
(NaN = Not a
Number) is the missing value code used by these classes. Any element of
y that is missing must be set to NaN. Other classes do not allow
missing responses because the classes generally deal with balanced
designs.
The classes for the computations of basic statistics include Summary
,
Covariances
,
PartialCovariances
, PooledCovariances
,
Sort
, Ranks
, and
EmpiricalQuantiles
. They generally have relatively
simple arguments and most of them will allow for missing values. Missing
value codes can be set by using Double.NaN
.
The classes NormOneSample
,
NormTwoSample
, WelchsTTest
,
ChiSquaredTest
, WilcoxonRankSum
perform statistical tests and return a "p-value" for the test. The
p-value is the probability of observing data
that would yield a test statistic as extreme or more extreme under the
assumption of the null hypothesis. Hence, a small
p-value is evidence for the rejection of the null hypothesis.
Classes for tabulation include TableOneWay
,
TableTwoWay
, and TableMultiWay
.
The ContingencyTable
class computes many statistics of
interest in a two-way table. Statistics computed by this routine include the
usual chi-squared statistics, measures of association, Kappa, and many
others.
The CategoricalGenLinModel
class is concerned with
generalized linear models in discrete data. This routine may be used to
compute estimates and associated statistics in probit, logistic, minimum
extreme value, Poisson, negative binomial (with known number of successes),
and logarithmic models. Classification variables as well as weights,
frequencies, and additive constants may be used so that quite general linear
models can be fit. Residuals, a measure of influence, the coefficient
estimates, and other statistics are returned for each model fit. When
infinite parameter estimates are required, extended maximum likelihood
estimation may be used. Log-linear models may be fit through the use of
Poisson regression models. Results from Poisson regression models involving
structural and sampling zeros will be identical to the results obtained from
the log-linear model but will be fit by a quasi-Newton algorithm rather than
through iterative proportional fitting.
Goodness-of-fit is a term that refers to how well an hypothesized
distribution fits a set of data (see Conover, 1980). The methods for
goodness-of-fit include ChiSquaredTest
,
NormalityTest
,
KolmogorovOneSample
, and
KolmogorovTwoSample
.
The chi-squared goodness-of-fit test ChiSquaredTest
may
be used with discrete as well as continuous distributions.
The Kolmogorov-Smirnov methods KolmogorovOneSample
and
KolmogorovTwoSample
compute exact probabilities in
small to moderate sample sizes.
The Kolmogorov-Smirnov and chi-squared goodness-of-fit test routines allow for missing values (NaN, not a number) in the input data.
Cluster Analysis
Clustering deals with the problem of segmenting or organizing a set of individual objects into groups or "clusters" of similar objects, so that objects within a cluster are more similar to each other than they are to objects in other clusters. In practice, objects are virtually anything for which attributes or features can be measured and quantified into a data matrix X. Typically a set of clusters is constructed from a training data set. Then the clusters are used to predict the class label for a new object.
The library includes three classes that perform clustering and
classification: ClusterKMeans
, ClusterKNN
,
and ClusterHierarchical
.
ClusterKMeans
performs a K-means cluster analysis. The K-means
method attempts to find a clustering that minimizes the within-cluster
sum-of-squares. In this method of clustering the data matrix X is grouped so
that each observation (row in X) is assigned to one of a fixed number, K, of
clusters. The sum of the squared difference of each observation about the
mean of its assigned cluster is used as the criterion for assignment. In the
basic algorithm, observations are transferred from one cluster or another
when doing so decreases the within-cluster sum-of-squared differences. When
no transfer occurs in a pass through the entire data set, the algorithm
stops. The clustering can be evaluated by methods described in "Basic
Statistics," or others throughout the library. Often, K-means clustering with
more than one value of K is performed, and the value of K that best fits the
data is used.
ClusterKNN
performs k-Nearest Neighbor, or KNN, cluster
analysis. For a new object, KNN finds a cluster consisting of the object's K
closest neighbors, where "closest" is in the sense of a specified distance
metric, such as the L2 norm or the L1 norm. The new object then receives the
class label that the majority of the objects in its neighborhood have
(majority vote). The algorithm works with a training set. For each
observation in the data X (representing a new object), the distance to each
object in a training set is calculated. The distances are sorted, and the
smallest K distances are noted. The objects with the K smallest distances
form the cluster. Note that unlike K-Means, KNN does not provide a clustering
of the data set. In a sense KNN forms clusters "on the fly" for the purpose
of classification.
ClusterHierarchical
performs a variant of hierarchical cluster
analysis. ClusterHierarchical
implements a bottom-up, or
agglomerative approach. Starting with every individual object in a data set
as a cluster, the two clusters (objects) that are closest in the sense of
some distance metric are then combined into a new cluster. With the new set
of n - 1 clusters, the process is repeated, so that at the h-th
stage, there are n - h clusters. The clustering continues until there
is only one cluster. Each level represents a different clustering of the data
that can be evaluated using fit statistics and other metrics.
Principal Components and Factor Analysis
The idea in principal components is to find a small number of linear
combinations of the original variables that maximize the variance accounted
for in the original data. This amounts to an eigensystem analysis of the
covariance (or correlation) matrix. The class
FactorAnalysis
performs either principal component
analysis or factor analysis. When the principal component model is used,
FactorAnalysis
computes standard errors for the eigenvalues.
Correlations of the original variables with the principal component scores
also are computed.
Factor analysis and principal component analysis, while quite different in assumptions, often serve the same ends. Unlike principal components in which linear combinations yielding the highest possible variances are obtained, factor analysis generally obtains linear combinations of the observed variables according to a model relating the observed variable to hypothesized underlying factors, plus a random error term called the unique error or uniqueness. In factor analysis, the unique errors associated with each variable are usually assumed to be independent of the factors. In addition, in the common factor model, the unique errors are assumed to be mutually independent. The factor analysis model is expressed in the following equation: $$x - \mu = \Lambda f + e $$ where \(x\) is the \(p\) vector of observed values, \(\mu\) is the \(p\) vector of variable means, \(\Lambda\) is the \(p \times k\) matrix of factor loadings, \(f\) is the \(k\) vector of hypothesized underlying random factors, \(e\) is the \(p\) vector of hypothesized unique random errors, \(p\) is the number of variables in the observed variables, and \(k\) is the number of factors.
Because much of the computation in factor analysis was originally done by hand or was expensive on early computers, quick (but dirty) algorithms that made the calculations possible were developed. One result is the many factor extraction methods available today. Generally speaking, in the exploratory or model building phase of a factor analysis, a method of factor extraction that is not computationally intensive (such as principal components, principal factor, or image analysis) is used. If desired, a computationally intensive method is then used to obtain the final factors.
Discriminant Analysis
Discriminant analysis involves fitting a mathematical "boundary" between
groups for the purpose of classifying new observations into those groups.
The class DiscriminantAnalysis
performs linear or
quadratic discrimination together with a number of approaches for classifying
new observations, including reclassification, leave-out-one, and split-sample.
The data consists of multivariate observations along with
known memberships in a fixed number groups. The discriminant function is
then fit to the known data. The analysis can be executed in an
online mode, meaning that one or more observations can be added to the
rule during each invocation of
. One or more
observations can also be removed from the state of the model, using
DiscriminantAnalysis.update(double[][])
DiscriminantAnalysis.downdate(double[][], int[])
.
Nonparametric statistics are derived without any assumptions about the
underlying probability distribution. Examples include measures of location
and scale (see "Basic Statistics"), descriptive measures in a contingency
table (see "Categorical and Discrete Data Analysis") and statistics used in
goodness of fit tests (see "Tests of Goodness of Fit"). In addition, the
classes SignTest
and
WilcoxonRankSum
perform nonparametric tests comparing
two populations.
Pdf
and Cdf
provide
static methods for several continuous and discrete probability distributions.
The distribution function for the (real, single-valued) random variable X is the function F defined for all real x by
$$F(x) = {\rm {Prob}}(X \leq x)$$where \(\rm Prob(\cdot)\) denotes the probability of an event. The distribution function is often called the cumulative distribution function (CDF).
For distributions with finite ranges, such as the beta
distribution, the CDF is \(0\) for values less than the left endpoint and
\(1\) for values greater than the right endpoint. The methods in the
Cdf
class return the correct values for the
distribution functions when values outside of the range of the random
variable are input, but warning error conditions are set in these cases.
For discrete distributions, the function giving the probability that the random variable takes on specific values is called the probability mass function, defined by
$$p(x) = {\rm {Prob}}(X = x)$$
The CDF for a discrete random variable is
$$F(x) = \sum\limits_A p(k) $$
where A is set such that \(k \leq x\). Since the distribution function is a step function, its inverse does not exist uniquely.
For continuous distributions, a probability mass function, as defined above, would not be useful because the probability of any given point is \(0\). For such distributions, the useful analog is the probability density function (PDF). The integral of the PDF is the probability over the interval. If the continuous random variable X has PDF f, then
$${\rm Prob}(a \le X \le b) = \int_a^b f(x)\,dx $$
The relationship between the CDF and the PDF is
$$F(x) = \int_{-\infty}^x f(t) \, dt $$
For (absolutely) continuous distributions, the value of \(F(x)\)
uniquely determines \(x\) within the support of the distribution.
The "inverse" methods in the Cdf
class compute the inverses of
the distribution functions, that is, given \(F(x)\), they compute,
\(x\). The inverses are defined only over the open interval
\((0,1)\).
Whenever a probability close to \(1.0\) results from a call to a distribution
function or is to be input to an inverse function, it is often impossible to
achieve good accuracy because of the nature of the representation of numeric
values. In this case, it may be better to work with the complementary
distribution function (one minus the distribution function). If the
distribution is symmetric about some point (as the normal distribution, for
example) or is reflective about some point (as the beta distribution, for
example), the complementary distribution function has a simple relationship
with the distribution function. For example, to evaluate the standard normal
distribution at \(4.0\), using the com.imsl.stat
method in the
Cdf
class directly, the result to six places is \(0.999968\). Only
two of those digits are really useful, however. A more useful result may be
\(1.000000\) minus this value, which can be obtained to six places as
\(3.16712\text{e-}5\)
by evaluating normal
at \(-4.0\). For the normal distribution, the
two values are related by \(\Phi(x) = 1 - \Phi(-x)\), where \(\Phi (\cdot)\)
is the normal distribution function. Another example is the beta distribution
with parameters \(2\) and \(10\). This distribution is skewed to the right, so
evaluating beta
at \(0.7\), the value \(0.999953\) is obtained.
A more precise result is obtained by evaluating beta
with
parameters \(10\) and \(2\) at \(0.3\). This yields \(4.72392\text{e-}5\).
Many of the algorithms used by the classes in this section are discussed by Abramowitz and Stegun (1964). The algorithms make use of various expansions and recursive relationships and often use different methods in different regions.
Cumulative distribution functions are defined for all real arguments. However, if the input to one of the distribution functions in this section is outside the range of the random variable, an error is issued.
Definitions and discussions of the basic terms can be found in Johnson and Kotz (1969, 1970a, 1970b). These are also good references for the specific distributions.
Many of the methods for probability distributions are written for standard
forms of statistical distributions. Hence, the number of parameters for any
given distribution may be fewer than the number often associated with the
distribution. Also, the methods relating to the normal distribution,
Cdf.normal(double)
and Cdf.inverseNormal(double)
,
are for a normal distribution with mean equal to zero and variance equal to
one. For other means and variances, it is very easy for the user to
standardize the variables by subtracting the mean and dividing by the square
root of the variance.
This section describes classes and methods for the generation of random numbers that are useful for applications in simulation studies.
In the following discussions, the phrases random numbers, random deviates, deviates, and variates are used interchangeably. The phrase pseudorandom is sometimes used to emphasize that the numbers generated are really not random since they result from a deterministic process. The usefulness of pseudorandom numbers is derived from the similarity, in a statistical sense, of samples of the pseudorandom numbers to samples of observations from the specified distributions. In short, while the pseudorandom numbers are completely deterministic and repeatable, they simulate the realizations of independent and identically distributed random variables.
Class Random
extends Random
. It adds
the ability to generate random numbers with a number of different non-uniform
distributions. The non-uniform random number generators use a uniform random
number generator. The uniform random number generator in
java.util.Random
can be used. Random
adds a linear
congruential generator. It is also possible to use another uniform generator
as long as it implements the Random.BaseGenerator
interface.
The MersenneTwister
and MersenneTwister64
uniform random number generators generate random number series with very long
periods. They both implement the Random.BaseGenerator
interface
and so can be used as the base uniform generator in
com.imsl.stat.Random
.
The class FaureSequence
generates the low-discrepancy
Faure sequence. This is also called a quasi-random generator. The
sequence is a series of points in n-dimensions, which are close to
being as equally spaced as possible. Low-discrepancy refers to the difference
from actually being as equally spaced as possible.
The FaureSequence
implements the RandomSequence
interface. This interface defines a sequence of points in
n-dimensions. It is used by the class
HyperRectangleQuadrature
to evaluate
n-dimensional integrals.
The classes LinearRegression
and
RegressorsForGLM
contain methods for linear regression and
the multivariate general linear model. The class NonlinearRegression
is available for nonlinear regression models. Methods
for estimating or fitting regression models, methods for computing summary
statistics and diagnostics, and methods for computing confidence intervals
for individual cases are provided. There are also methods for building a
model from a set of candidate variables, i.e., variable selection.
The simple linear regression model is
$$y_i=\beta_0+\beta_1 x_i+\varepsilon_i$$
for \(i=1,2,\ldots ,n\). The \(y_i\)'s represent the observed values of the response or dependent variable, the \(x_i\)'s are the settings of the independent (explanatory) variable, \(\beta_0 \) and \(\beta_1\) are the intercept and slope parameters (respectively) and the \(\varepsilon_i\)'s are independently distributed normal errors, each with mean 0 and variance \(\sigma^2\).
The multiple linear regression model is
$$y_i=\beta_0+\beta_1 x_{i1}+\beta_2 x_{i2}+\ldots+ \beta_k x_{ik}+\varepsilon_i$$
for \(i=1,2,\ldots,n\). The \(y_i\)'s constitute the observed values of the response or dependent variable; the \(x_{i1} \)'s, \(x_{i2}\)'s,\(\ldots,\) \(x_{ik}\)'s are the settings of the k independent (explanatory) variables; \(\beta_0,\beta_1,\ldots,\beta_k\) are the regression coefficients; and the \(\varepsilon_i\)'s are independently distributed normal errors, each with mean 0 and variance \(\sigma^2 \).
The class LinearRegression
fits both the simple and multiple
linear regression models using a fast Givens transformation and includes an
option for excluding the intercept \(\beta_0\). The responses are input in
array y
, and the independent variables are input in array
x
, where the individual cases correspond to the rows and the
variables correspond to the columns.
After the model has been fitted using the LinearRegression
class, member "get" methods such as
LinearRegression.getCoefficientTTests()
can be used to
retrieve summary statistics. Predicted values, confidence intervals, and case
statistics for the fitted model can be obtained from inner class
LinearRegression.getCaseStatistics(double[], double)
.
Several classes provide the option for excluding the intercept from a model. In most practical applications, the intercept should be included in the model. For classes that use the sums of squares and crossproducts matrix as input, the no-intercept case can be handled by using the raw sums of squares and crossproducts matrix as input in place of the corrected sums of squares and crossproducts. The raw sums of squares and crossproducts matrix can be computed as \(\left({x_1,x_2,\ldots,x_k,y}\right)^T\left({x_1,x_2,\ldots,x_k,y} \right)\).
Variables used in the general linear model are either continuous or classification variables. Typically, multiple regression models use continuous variables, whereas analysis of variance models use classification variables. Although the notation used to specify analysis of variance models and multiple regression models may look quite different, the models are essentially the same. The term general linear model emphasizes that a common notational scheme is used for specifying a model that may contain both continuous and classification variables.
A general linear model is specified by its effects (sources of variation). An effect is a single variable or a product of variables. In particular, an effect is composed of one of the following:
Effects of the first type are common in multiple regression models. Effects of the second type appear as main effects in analysis of variance models. Effects of the third type appear as interactions in analysis of variance models. Effects of the fourth type appear in polynomial models and response surface models as powers and crossproducts of some basic variables. Effects of the fifth type appear in one-way analysis of covariance models as regression coefficients that indicate lack of parallelism of a regression function across the groups.
The analysis of a general linear model occurs in two stages. The first stage
calls class RegressorsForGLM
to specify all regressors except
the intercept. The second stage uses LinearRegression
at
which point the model will be specified as either having or not having an
intercept.
Suppose the data matrix has as its first four columns two continuous variables in Columns \(0\) and \(1\) and two classification variables in Columns \(2\) and \(3\). The data might appear as follows:
Column \(0\) | Column \(1\) | Column \(2\) | Column \(3\) |
\(11.23\) | \(1.23\) | \(1.0\) | \(5.0\) |
\(12.12\) | \(2.34\) | \(1.0\) | \(4.0\) |
\(12.34\) | \(1.23\) | \(1.0\) | \(4.0\) |
\(4.34\) | \(2.21\) | \(1.0\) | \(5.0\) |
\(5.67\) | \(4.31\) | \(2.0\) | \(4.0\) |
\(4.12\) | \(5.34\) | \(2.0\) | \(1.0\) |
\(4.89\) | \(9.31\) | \(2.0\) | \(1.0\) |
\(9.12\) | \(3.71\) | \(2.0\) | \(1.0\) |
Each distinct value of a classification variable determines a level. The classification variable in Column \(2\) has two levels. The classification variable in Column \(3\) has three levels. (Integer values are recommended, but not required, for values of the classification variables. The values of the classification variables corresponding to the same level must be identical.) Some examples of regression functions and their specifications are as follows:
Intercept | Class Columns | Effects | |
\( \beta_0+\beta_1 x_1\) | true |
{} | \(\{ \{0\} \}\) |
\(\beta_0+\beta_1 x_1 + \beta_2 x_1^2\) | true |
{} | \(\{\{0\}, \{0,0\}\}\) |
\(\mu+\alpha_i\) | true |
\(\{2\}\) | \({ \{2\} }\) |
\(\mu+\alpha_i+\beta_j+\gamma_{ij}\) | true |
\(\{2, 3\}\) | \(\{\{2\}, \{3\}, \{2, 3\}\}\) |
\(\mu_{ij}\) | false |
\({2, 3}\) | \({ \{2, 3\} }\) |
\(\beta_0+\beta_1 x_1 + \beta_2 x_2 + \beta_3 x_1 x_2\) | true |
\(\{\}\) | \({\{0\}, \{1\}, \{0, 1\}}\) |
Variation Src. | Deg. of Freedom | Sum of Squares | Mean Square | F | p-value |
Regression | DFR=AOV[0] |
SSR=AOV[3] |
MSR=AOV[6] |
AOV[8] |
AOV[9] |
Error | DFE=AOV[1] |
SSE=AOV[4] |
\(s^2\)=AOV[7] |
||
Total | DFT=AOV[2] |
SST=AOV[5] |
If the model has an intercept (default), the total sum of squares is the sum of squares of the deviations of \(y_i\) from its (weighted) mean \(\bar {y}\) - the so-called corrected total sum of squares, denoted by the following:
$$\textrm{SST}=\sum\limits_{i=1}^n w_i(y_i-\bar y)^2 $$
If the model does not have an intercept (hasIntercept = false
),
the total sum of squares is the sum of squares of \(y_i\) - the so-called
uncorrected total sum of squares, denoted by the following:
$$\textrm{SST}=\sum\limits_{i=1}^n w_i y_i^2 $$
The error sum of squares is given as follows:
$$\textrm{SSE}=\sum\limits_{i=1}^n{w_i}\left(y_i- \hat{y_i}\right)^2$$
where \( \hat{y_i} = x_i\hat{\beta} \) is the value estimated by the fitted regression model.The error degrees of freedom is defined by \(\textrm{DFE}=n-r \).
The estimate of \(\sigma^2\) is given by \(s^2= \textrm{SSE}/\textrm{DFE}\), which is the error mean square.
The computed F statistic for the null hypothesis, \(H_0: \beta_1=\beta_2=\ldots\beta_p=0\), versus the alternative that at least one coefficient is nonzero is given by \(F=\textrm{MSR}/s^2 \). The p-value associated with the test is the probability of an F larger than that computed under the assumption of the model and the null hypothesis. A small p-value (less than 0.05) is customarily used to indicate there is sufficient evidence from the data to reject the null hypothesis.
The remaining five elements in AOV
frequently are displayed
together with the actual analysis of variance table. The quantities
R-squared (\(R^2\) = AOV[10]
) and adjusted
R-squared (\(R_a^2\) = AOV[11]
) are expressed as a
percentage and are defined as follows:
$$R_a^2=100\ \textrm{max}\left\{0,1-\frac{s^2}{\textrm{ SST}/ \textrm{DFT}}\right\}$$
The square root of \(s^2 = s\)= AOV[12]
is frequently referred
to as the estimated standard deviation of the model error.
The overall mean of the responses \({\bar y}\) is output in
AOV[13]
.
The coefficient of variation (\(CV\)=AOV[14]
is expressed as a
percentage and defined by \(\textrm{CV}=100s/\bar y \).
LinearRegression.CoefficientTTests
is a nested class within the LinearRegression
and
StepwiseRegression
classes. The statistics (estimated
standard error,
t-statistic and p-value) associated with each coefficient can
be retrieved with their associated "get" methods. And finally, getR()
returns the variance-covariance matrix of the estimated coefficients.
Diagnostics for individual cases (observations) are computed by the
LinearRegression.CaseStatistics
class for linear regression.
Statistics computed include predicted values, confidence intervals, and diagnostics for detecting outliers and cases that greatly influence the fitted regression.
The diagnostics are computed under the model \(y=X\beta+\varepsilon\), where y is the \(n\times 1\) vector of responses, X is the \(n\times p\) matrix of regressors with rank(X) = r, \(\beta\) is the \(p\times 1 \) vector of regression coefficients, and \(\varepsilon \) is the \(n\times 1\) vector of errors whose elements are independently normally distributed with mean \(0\) and variance \(\sigma^2/w_i\).
Given the results of a weighted least-squares fit of this model (with the \(w_i\)'s as the weights), the following five diagnostics are computed:
The definition of these terms is given in the discussion that follows:
Let \(x_i\) be a column vector containing the elements of the i-th row of X. A case can be unusual either because of \(x_i\) or because of the response \(y_i\). The leverage \(h_i \) is a measure of uniqueness of the \(x_i\). The leverage is defined by$$h_i=[x_i^T\left({X^T WX}\right)^-x_i]w_i $$
where \(W=\textrm{diag}\left({w_1,w_2\ldots,w_n}\right)\) and \(\left({X^T WT}\right)^-\) denotes a generalized inverse of \(X^T WT\). The average value of the \(h_i \)'s is \(r/n\). Regression classes declare \(x_i\) unusual if \(h_i>2r/n\). Hoaglin and Welsch (1978) call a data point highly influential (i.e., a leverage point) when this occurs.
Let \(e_i\) denote the residual
$$y_i-\hat y_i$$
for the i-th case. The estimated variance of \(e_i\) is \((1-h_i)s^2/w_i\), where \(s^2\) is the residual mean square from the fitted regression. The i-th standardized residual (also called the internally studentized residual) is by definition
$$r_i=e_i\sqrt{\frac{w_i}{s^2\left({1-h_i}\right)}} $$
and \(r_i\) follows an approximate standard normal distribution in large samples.
The i-th jackknife residual or deleted residual involves the difference between \(y_i\) and its predicted value, based on the data set in which the i-th case is deleted. This difference equals \(e_i/\left({1-h_i}\right)\). The jackknife residual is obtained by standardizing this difference. The residual mean square for the regression in which the i-th case is deleted is as follows:
$$s_i^2=\frac{{\left({n-r}\right)s^2-w_i e_i^2/ \left({1-h_i}\right)}}{{n-r-1}}$$
The jackknife residual is defined as
$$t_i=e_i\sqrt{\frac{w_i}{s_i^2\left({1-h_i} \right)}}$$
and \(t_i\) follows a \(t-\)distribution with \(n-r\times 1\) degrees of freedom.
Cook's distance for the i-th case is a measure of how much an individual case affects the estimated regression coefficients. It is given as follows:
$$D_i=\frac{w_i h_i e_i^2}{rs^2\left({1-h_i}\right) ^2}$$
Weisberg (1985) states that if \(D_i\) exceeds the 50-th percentile of the F(r, n - r) distribution, it should be considered large. (This value is about 1. This statistic does not have an F distribution.)
DFFITS, like Cook's distance, is also a measure of influence. For the i-th case, DFFITS is computed by the formula below.
$$\text{DFFITS}_i=e_i\sqrt{\frac{w_i h_i}{s_i^2 \left({1-h_i}\right)^2}}$$Hoaglin and Welsch (1978) suggest that DFFITS greater than
$$2\sqrt{r/n}$$
is large.Transformations of the independent variables are sometimes useful in order to satisfy the regression model. The inclusion of squares and crossproducts of the variables
$$\left({x_1,x_2,x_1^2,x_2^2,x_1 x_2}\right) $$
is often needed. Logarithms of the independent variables are used also. (See Draper and Smith 1981, pp. 218-222; Box and Tidwell 1962; Atkinson 1985, pp. 177-180; Cook and Weisberg 1982, pp. 78-86.)
When the responses are described by a nonlinear function of the parameters, a transformation of the model equation often can be selected so that the transformed model is linear in the regression parameters. For example, by taking natural logarithms on both sides of the equation, the exponential model
$$y=e^{\beta_0+\beta_1 x_1}\varepsilon $$
can be transformed to a model that satisfies the linear regression model provided the \(\varepsilon_i\)'s have a log-normal distribution (Draper and Smith, pp. 222-225).
When the responses are nonnormal and their distribution is known, a transformation of the responses can often be selected so that the transformed responses closely satisfy the regression model assumptions. The square-root transformation for counts with a Poisson distribution and the arc-sine transformation for binomial proportions are common examples (Snedecor and Cochran 1967, pp. 325-330; Draper and Smith, pp. 237-239).
NaN (Not a Number) is the missing value code used by the regression
classes. Use field Double.NaN
to retrieve NaN. Any element of
the data matrix that is missing must be set to Double.NaN
. In
fitting regression models, any observation containing NaN
for
the independent, dependent, weight, or frequency variables is omitted from
the computation of the regression parameters.
The classes described in this section have primary application in the areas
of reliability and life testing, but they may find application in any
situation in which analysis of binomial events over time is of interest.
Kalbfleisch and Prentice (1980), Elandt-Johnson and Johnson (1980), Lee
(1980), Gross and Clark (1975), Lawless (1982), and Chiang (1968) and Tanner
and Wong (1984) are references for discussing the models and methods described
in this section. The class KaplanMeierEstimates
produces Kaplan-Meier
(product-limit) estimates of the survival distribution in a single
population. The class ProportionalHazards
computes the
parameter estimates in a proportional hazards model. The class
LifeTables
computes and (optionally) prints an
actuarial table based either upon a cohort followed over time or a
cross-section of a population.
The classes in this section allow users to filter and analyze time series data using autoregressive, ARIMA, GARCH and state-space models. Some are designed for automatic model selection, estimation and forecasting using algorithms based upon minimizing AIC (Akaike's Information Criterion).
Most time series classes in the library assume the data are stationary and
collected at equally spaced times with no missing observations in the series.
In order to prepare a series for analysis, any missing values must be
replaced with estimates. The class ARMAEstimateMissing
can be used to automatically estimate missing values using one of four,
user-selected methods, provided the largest time gap with missing values has
no more than 4 missing values. An exception is the class
AutoARIMA
, which accepts a data series with missing values.
In addition to estimating missing values, if a series is nonstationary or
contains seasonal variation, its values should be filtered before fitting a
model and preparing forecasts. Class ARSeasonalFit
evaluates seasonal adjustments prior to modeling the series. Users specify a
range of adjustments.ARSeasonalFit
evaluates each adjustment to
identify the one that minimizes the AIC. If a user already knows the series
needs to be filtered using differences between consecutive values, the class
Difference
can be used to filter a series into an
equivalent series of differences.
There are several classes available for transforming a time series into its
autocorrelation matrix:
AutoCorrelation
,CrossCorrelation
and MultiCrossCorrelation
. Class AutoCorrelation
computes the autocorrelation and partial autocorrelation matrices
from a time series or its filtered values. Box and Jenkins (1976) describe
how to identify the structure of an ARIMA model using the autocorrelation and
partial autocorrelation matrices.
Classes CrossCorrelation
and MultiCrossCorrelation
are used to calculate the cross-correlation matrix for two univariate or
multivariate time series, respectively.
There are several classes useful for modeling stationary, equally spaced time
series - ARMA
, ARMAMaxLikelihood
,
GARCH
, and KalmanFilter
. Class
ARMA
can be used to fit autoregressive, moving-average and ARIMA
(autoregressive, integrated moving average) models. Rather than passing the
original series to this class, users are required to construct this class
using the autocorrelation matrix of the series. Class
AutoCorrelation
is useful for computing this matrix prior to
constructing the ARMA
class.
The other classes require users to construct the class using the original
series or its filtered values. Class ARMAMaxLikelihood
estimates
the maximum likelihood estimates of the ARMA parameters and prepares
forecasts from these values. Class ARAutoUnivariate
selects the best autoregressive model for an equally spaced, stationary time
series using AIC.Class GARCH
estimates the parameters of a GARCH
model and the KalmanFilter
class estimates the parameters for a
state-space model using Kalman filtering and calculates one-step ahead
forecasts.
Several classes contain methods for obtaining
forecasts from an ARIMA model. Class KalmanFilter
can be used to
obtain one-step ahead forecasts for a state-space model.
A small, yet comprehensive, class of stationary time-series models consists of the nonseasonal ARMA processes defined by
\(\phi \left( B \right)\left( {W_t - \mu } \right) = \theta \left( B \right)A_t , \,\,\,\,\,\,\,\, t \in Z\)
where \(Z = {\dots, -2, -1, 0, 1, 2, \dots}\) denotes the set of integers, B is the backward shift operator defined by \(B^kW_t = W_{t-k}, \mu\) is the mean of \(W_t\), and the following equations are true:
\( \phi(B) = 1 - \phi_1 B - \phi_2B^2 - \dots - \phi_pB^p, p \geq 0\)
\(\theta(B) = 1 -\theta_1 B- \theta_2B^2 - \dots - \theta_qB^q, q \geq 0 \)
The model is of order (p, q) and is referred to as an ARMA (p, q) model.
An equivalent version of the ARMA (p, q) model is given by
\(\phi(B) W_t = \theta_0 + \theta(B)A_i, \,\,\,\,\,\, t \in Z\)
where \(\theta_0\) is an overall constant defined by the following:
\(\theta _0 = \mu \left( {1 - \sum\limits_{i = 1}^p {\phi _i } } \right) \)
See Box and Jenkins (1976, pp. 92-93) for a discussion of the meaning and usefulness of the overall constant.
If the "raw" data, \(\left\{ {Z_t } \right\}\), are homogeneous and
nonstationary, then differencing using the Difference class induces
stationarity, and the model is called ARIMA (AutoRegressive Integrated Moving
Average). Parameter estimation is performed on the stationary time series
\(W_t = \Delta^dZ_t\) , where \(\Delta^d = (1 - B)^d\) is the backward
difference operator with period 1 and order \(d, {\rm d} \gt 0\). For the
ARIMA model, see AutoARIMA
class for parameter
estimation and model selection.
There are two classes for estimating the parameters in an ARMA model and
preparing forecasts: ARMA
and
ARMAMaxLikelihood
. Class ARMA
estimates model parameters using either method of moments or least
squares. The method of moments is selected by default or by
setting property Method
to METHOD_Of_MOMENTS.
Method of moment estimates are good initial values for
obtaining the least-squares estimates by setting property
Method
to LEAST_SQUARES
. Other initial estimates
provided by the user can be used. The least-squares procedure can be used to
compute conditional or unconditional least-squares estimates of the
parameters, depending on the choice of the backcasting length. The parameter
estimates from either the method of moments or least-squares procedures can
be used in the forecast
method. The methods for preliminary
parameter estimation, least-squares parameter estimation, and forecasting
follow the approach of Box and Jenkins (1976, Programs 2-4, pp. 498-509).
Class ARMAMaxLikelihood
estimates model parameters
using maximum likelihood. Users can tailor the maximum likelihood algorithm
by setting tolerances and providing initial estimates for the parameters. By
default, ARMAMaxLikelihood
uses method of moment estimates to
initialize the parameters. However, this can be overridden with the
ARMAMaxLikelihood.setAR(double[])
method.
Exponential smoothing approximates the value of a time series at time t with a weighted average of previous values, with weights defined in such a way that they decay exponentially over time. The weights can be determined by a smoothing parameter \(\alpha\) and the relation,
$${y_t} = \alpha {y_{t - 1}} + {\hat y_t} $$ $$\Rightarrow $$ $${y_t} = \sum\limits_{j = 0}^{t - 1} {{\alpha ^{t - j}}{{(1 - \alpha )}^j}} {y_{_j}} = \sum\limits_{j = 0}^{t - 1} {{w_j}} {y_{_j}}$$
The parameter \(\alpha\) is on the interval (0,1) and controls the rate of
decay. For values close to 1, the effect decays rapidly, whereas for values
close to 0, the influence of a past value persists for some time. Exponential
smoothing as a forecasting procedure is widely used and largely effective for
short term, mean level forecasting. With the addition of a term for linear
trend and terms or factors for seasonal patterns, exponential smoothing is an
intuitive procedure for many series that arise in business applications.
Class HoltWintersExponentialSmoothing
performs the
Holt-Winters method, otherwise known as triple exponential smoothing,
and allows for either an additive or a multiplicative seasonal effect.
Copyright © 2020 Rogue Wave Software. All rights reserved.