NaiveBayesClassifier Class

Trains a Naive Bayes Classifier

Inheritance Hierarchy

SystemObject
Imsl.DataMiningNaiveBayesClassifier

Namespace: Imsl.DataMining
Assembly: ImslCS (in ImslCS.dll) Version: 6.5.2.0

Syntax

C++

Copy

public class NaiveBayesClassifier

Public Class NaiveBayesClassifier

public ref class NaiveBayesClassifier

type NaiveBayesClassifier =  class end

The NaiveBayesClassifier type exposes the following members.

Constructors

	Name	Description
	NaiveBayesClassifier	Constructs a NaiveBayesClassifier.

Top

Methods

	Name	Description
	ClassError	Returns the classification probability error for the input pattern and known target classification.
	CreateContinuousAttribute(IProbabilityDistribution)	Create a continuous variable and the associated distribution function.
	CreateContinuousAttribute(IProbabilityDistribution)	Create a continuous variable and the associated distribution functions for each target classification.
	CreateNominalAttribute	Create a nominal attribute and the number of categories
	Equals	Determines whether the specified object is equal to the current object. (Inherited from Object.)
	Finalize	Allows an object to try to free resources and perform other cleanup operations before it is reclaimed by garbage collection. (Inherited from Object.)
	GetClassCounts	Returns the number of patterns for each target classification.
	GetClassificationErrors	Returns the classification probability errors for each pattern in the training data.
	GetHashCode	Serves as a hash function for a particular type. (Inherited from Object.)
	GetMeans	Returns a table of means for each continuous attribute in continuousData segmented by the target classes in classificationData.
	GetPredictedClass	Returns the predicted classification for each training pattern.
	GetProbabilities	Returns the predicted classification probabilities for each target class.
	GetStandardDeviations	Returns a table of standard deviations for each continuous attribute in continuousData segmented by the target classes in classificationData.
	GetTrainingErrors	Returns a table of classification errors of non-missing classifications for each target classification plus the overall total of classification errors.
	GetType	Gets the Type of the current instance. (Inherited from Object.)
	IgnoreMissingValues	Specifies whether or not missing values will be ignored during the training process.
	MemberwiseClone	Creates a shallow copy of the current Object. (Inherited from Object.)
	PredictClass	Predicts the classification for the input pattern using the trained Naive Bayes classifier.
	Probabilities	Predicts the classification probabilities for the input pattern using the trained Naive Bayes classifier.
	ToString	Returns a string that represents the current object. (Inherited from Object.)
	Train	Trains a Naive Bayes classifier for classifying data into one of nClasses target classifications.

Top

Properties

	Name	Description
	ContinuousSmoothingValue	Parameter for calculating smoothed estimates of conditional probabilities for continuous attributes.
	DiscreteSmoothingValue	Parameter for calculating smoothed estimates of conditional probabilities for discrete (nominal) attributes.
	ZeroCorrection	Specifies the replacement value to be used for conditional probabilities equal to zero.

Top

Remarks

NaiveBayesClassifier trains a Naive Bayes classifier for classifying data into one of nClasses target classes. Input attributes can be a combination of both nominal and continuous data. Ordinal data can be treated as either nominal attributes or continuous. If the distribution of the ordinal data is known or can be approximated using one of the continuous distributions, then associating them with continuous attributes allows a user to specify that distribution. Missing values are allowed.

Before training the classifier the input attributes must be specified. For each nominal attribute, use method CreateNominalAttribute to specify the number of categories in each nNominal attribute. Specify the input attributes in the same column order that they will be supplied to the Train method. For example, if the input attribute in the first two columns of the nominal input data, nominalData, represent the first two nominal attributes and have two and three categories respectively, then the first call to the CreateNominalAttribute method would specify two categories and the second call to CreateNominalAttribute would specify three categories.

Likewise, for each continuous attribute, the method CreateContinuousAttribute can be used to specify a IProbabilityDistribution other than the default NormalDistribution. A second CreateContinuousAttribute is provided to allow specification of a different distribution for each target class (see Example 3). Create each continuous attribute in the same column order they will be supplied to the Train method. If CreateContinuousAttribute is not invoked for all nContinuous attributes, the NormalDistribution probability distribution will be used. For example, if five continuous attributes have been specified in the constructor, but only three calls to CreateContinuousAttribute have been invoked, the last two attributes, or columns of continuousData in the Train method, will use the NormalDistribution probability distribution.

Nominal only, continuous only, and a combination of both nominal and continuous input attributes are allowed. The Train method allows either nominal or continuous input arrays to be null.

Let C be the classification attribute with target categories $0, 1, \ldots, \mbox{nClasses}-1$ , and let $X = \{x_1, x_2, \ldots, x_k\}$ be a vector valued array of k=nNominal+nContinuous input attrtibutes, where nNominal is the number of nominal attributes and nContinuous is the number of continuous attributes. See methods CreateNominalAttribute to specify the number of categories for each nominal attribute and CreateContinuousAttribute to specify the distribution for each continuous attribute. The classification problem simplifies to estimate the conditional probability P(C|X) from a set of training patterns. The Bayes rule states that this probability can be expressed as the ratio:

$P(C = c|X = \{x_1, x_2, \ldots, x_k\}) = \frac{P(C=c)P(X=\{x_1, x_2, \ldots, x_k\})|C=c)}{P(X=\{x_1, x_2, x,\ldots,x_k \})}$

where c is equal to one of the target classes $0, 1, \ldots, \mbox{nClasses}-1$ . In practice, the denominator of this expression is constant across all target classes since it is only a function of the given values of X. As a result, the Naive Bayes algorithm does not expend computational time estimating $P(X=\{x_1, x_2, x,\ldots,x_k \})$ for every pattern. Instead, a Naive Bayes classifier calculates the numerator $P(C=c)P(X=\{x_1, x_2, \ldots, x_k\})|C=c)$ for each target class and then classifies X to the target class with the largest value, i.e.,

$X\xleftarrow[{\max (c = 0,1,\ldots, \mbox{nClasses} - 1)}]{}P(C = c)P(X|C = c)$

The classifier simplifies this calculation by assuming conditional independence. That is it assumes that:

$P(X = \{x_1, x_2, \ldots, x_k\}|C=c) = \prod_{j=1}^{k} P(x_j|C=c)$

This is equivalent to assuming that the values of the input attributes, given C, are independent of one another, i.e.,

$P(x_i|x_j,C=c)=P(x_i|C=c),\,\,\, \mbox{for all}\,\,\,i \neq j$

In real world data this assumption rarely holds, yet in many cases this approach results in surprisingly low classification error rates. Since, the estimate of $P(C=c|X=\{x_1,x_2,\ldots,x_k\})$ from a Naive Bayes classifier is generally an approximation, classifying patterns based upon the Naive Bayes algorithm can have acceptably low classification error rates.

For nominal attributes, this implementation of the Naive Bayes classifier estimates conditional probabilities using a smoothed estimate:

$P(x_j|C=c)= \frac{ \# N \{ x_j \, \cap\, C=c \} + \lambda }{ \# N \{ C=c \} + \lambda j} \mbox{,}$

where #N{Z} is the number of training patterns with attribute Z and j is equal to the number of categories associated with the j-th attribute.

The probability P(C=c) is also estimated using a smoothed estimate:

$P(C=c)= \frac{\# N\{C=c\} + \lambda }{\mbox{nPatterns} + \lambda (\mbox{nClasses})} \,\,\, \mbox{.}$

These estimates correspond to the maximum a priori (MAP) estimates for a Dirichelet prior assuming equal priors. The smoothing parameter can be any non-negative value. Setting $\lambda=0$ corresponds to no smoothing. The default smoothing used in this algorithm, $\lambda=1$ , is commonly referred to as Laplace smoothing. This can be specified using the property DiscreteSmoothingValue.

For continuous attributes, the same conditional probability in the Naive Bayes formula is replaced with the conditional probability density function . By default, the density function for continuous attributes is the normal (Gaussian) probability density function (see NormalDistribution):

$f(x_j|C=c) = \frac{1}{\sigma \sqrt{2\pi}}e^{-\frac{{\left(x_j - \mu\right)}^2}{2{\sigma}^2}}$

where $\mu$ and $\sigma$ are the conditional mean and standard deviation, i.e. the mean and standard deviation of

when C = c. For convenience, methods GetMeans and GetStandardDeviations are provided to calculate the conditional mean and standard deviations of the training patterns.

In addition to the default normal pdf, users can select any continuous distribution to model the continuous attribute by providing an implementation of the Imsl.Stat.IProbabilityDistribution interface. See NormalDistribution, LogNormalDistribution, GammaDistribution, and PoissonDistribution for classes that implement the IProbabilityDistribution interface.

Smoothing conditional probability calculations for continuous attributes is controlled by the properties ContinuousSmoothingValue and ZeroCorrection. By default, conditional probability calculations for continuous attributes are unadjusted for calculations near zero. The value specified in the ContinuousSmoothingValue property will be added to each continuous probability calculation. This is similar to the effect of setting the property DiscreteSmoothingValue for the corresponding discrete calculations.

The value specified in the ZeroCorrection property is used when $(f(x|C=c) + \lambda)=0$ , where $\lambda$ is the smoothing parameter setting. If this condition occurs, the conditional probability is replaced with the property value set in ZeroCorrection.

Methods GetClassificationErrors, GetPredictedClass, GetProbabilities, and GetTrainingErrors provide information on how well the trained NaiveBayesClassifier predicts the known target classifications of the training patterns.

Methods Probabilities and PredictClass estimate classification probabilities and predict classification of the input pattern using the trained Naive Bayes Classifier. The predicted classification returned by PredictClass is the class with the largest estimated classification probability. Method ClassError predicts the classification from the trained Naive Bayes classifier and compares the predicted classifications with the known target classification provided. This allows verification of the classifier with a set of patterns other than the training patterns.

Reference

Imsl.DataMining Namespace

Other Resources

Naive Bayes Example 1

Naive Bayes Example 2

Naive Bayes Example 3