Support Vector Machines – An Overview¶
Support Vector Machines (SVM) are a class of learning algorithms for classification, regression, and distribution estimation motivated by results of statistical learning theory (Vapnik, 1995). Classification problems are characterized by separating data into training and testing sets. Each pattern, or instance in the training set, contains one “target classification value” (i.e., one of the class values) and several “attributes” (i.e., the features or observed variables). The goal of SVM is to produce a model based on the training data that predicts the target values of the test data.
If \((x_i,y_i)\) are the instance-label pairs for a given training set, and \(i=1,\ldots,l\), where l is the number of training patterns, \(x_i\in\mathbb{R}^n\) and \(y_i\in\{1,-1\}\), the support vector machine (SVM) (Boser et al., 1992; Cortes and Vapnik, 1995) solves the following primal optimization problem:
The quantities *w* and b are the weight vector and bias. \(C>0\) is the penalty parameter of the error term \(\xi_i\). The training vectors \(x_i\) are mapped into a higher (maybe infinite) dimensional space by the function \(\varphi(x_i)\), called the input features. SVM finds a linear separating hyperplane of maximal margin in this higher dimensional space. Rather than applying SVM using the original input attributes \(x_i\), the new features \(\varphi(x_i)\) are passed to the learning algorithm. \(K(x,y)\), an inner product defined as \(K(x_i,x_j)\equiv\varphi (x_i)^T \varphi(x_j)\), is the kernel function. Often, even though \(\varphi(x)\) itself may be very expensive to calculate (perhaps because it is an extremely high-dimensional vector), \(K(x_i,x_j)\) may be very inexpensive to calculate. Without ever having to explicitly find or represent vectors \(\varphi(x)\), using \(K(x_i,x_j)\) is an efficient way for SVMs to learn in the high-dimensional feature space. Four popular kernels for classification and regression are:
- linear: \(K(x_i,x_j)=x_i^T x_j\)
- polynomial: \(K(x_i,x_j)=(\gamma x_i^T x_j+r)^d\), \(\gamma>0\)
- radial basis function (RBF): \(K(x_i,x_j)=\exp(-\gamma\|x_i-x_j\|^2)\), \(\gamma>0\)
- sigmoid: \(K(x_i,x_j)=\tanh(\gamma x_i^T x_j+r)\)
where \(\gamma\), r, and d are kernel parameters.
SVM classification algorithms determine an optimal large-margin linear decision boundary for the given training data. The SVM formulation for classification could be for either two-class or multi-class classifications. Multiple binary classifiers are combined for multi-class classification. If the class information is not provided for the training data then the distribution estimation algorithm one-class SVM is used to estimate the support vectors of a high-dimensional distribution. The support vector methodology can also be applied to the regression problem by seeking to optimize the generalization bounds for regression which rely on defining a loss function that ignores errors within a certain distance of the true value. The following classification algorithms are supported.
- SVC (Support Vector Classification): Two-class and multi-class. This is the standard SVM algorithm used to classify two-class or multi-class data.
- One-class SVM: This algorithm assumes that the data is available from only one class. For example, the data comes from some unknown underlying probability distribution, P.
- SVR (Support Vector Regression): This algorithm applies the features of the SVM algorithm to the regression problem.
A typical use involves these steps:
- Scale the data. Typically, the data is linearly scaled to the range [-1, 1] or [0, 1]. The same scaling parameters must be used on both the training data and the test data. You may find scaleFilter useful for this step.
- Apply the trainer to the scaled training data set using one of the available kernel types to obtain a model. The RBF kernel is a good kernel type to start with.
- Use cross-validation to find the best model parameters.
- Use the resulting model with the best model parameters to predict information about the scaled testing data set.
For SVC and SVR, the classifier can calculate probability estimates. Function supportVectorTrainer is used to train a classifier from a set of training patterns with values of both the input attributes and target classes. This function stores the trained classifier model into an Imsls_d_svm_model data structure.
Unknown classifications of new patterns can be predicted by passing the trained classifier model data structure, Imsls_d_svm_model, to supportVectorClassification. If necessary, memory allocated to the trained classifier model can be released using svmClassifierFree.