## Binary separation and training support vector machines.(English)Zbl 1238.68123

Summary: We introduce basic ideas of binary separation by a linear hyperplane, which is a technique exploited in the support vector machine (SVM) concept. This is a decision-making tool for pattern recognition and related problems. We describe a fundamental standard problem (SP) and show how this is used in most existing research to develop a dual-based algorithm for its solution. This algorithm is shown to be deficient in certain aspects, and we develop a new primal-based SQP-like algorithm, which has some interesting features. Most practical SVM problems are not adequately handled by a linear hyperplane. We describe the nonlinear SVM technique, which enables a nonlinear separating surface to be computed, and we propose a new primal algorithm based on the use of low-rank Cholesky factors.
It may be, however, that exact separation is not desirable due to the presence of uncertain or mislabelled data. Dealing with this situation is the main challenge in developing suitable algorithms. Existing dual-based algorithms use the idea of $$L_1$$ penalties, which has merit. We suggest how penalties can be incorporated into a primal-based algorithm. Another aspect of practical SVM problems is often the huge size of the data set, which poses severe challenges both for software package development and for control of ill-conditioning. We illustrate some of these issues with numerical experiments on a range of problems.

### MSC:

 68T05 Learning and adaptive systems in artificial intelligence 68T10 Pattern recognition, speech recognition 65Y99 Computer aspects of numerical algorithms

### Software:

ElemStatLearn; MNIST; SSVM; GPDT; CONV_QP
Full Text:

### References:

 [1] Vapnik, Statistical Learning Theory (1998) [2] DOI: 10.1023/A:1021877911972 · Zbl 1056.68118 [3] Tsang, J. Mach. Learn. Res. 6 pp 363– (2005) [4] Herbrich, Learning Kernel Classifiers. Theory and Algorithms (2002) [5] Hastie, The Elements of Statistical Learning: Data Mining, Inference, and Prediction (2001) · Zbl 0973.62007 [6] DOI: 10.1142/9789812776655 [7] Shawe-Taylor, Kernel Methods for Pattern Analysis (2004) [8] DOI: 10.1023/A:1009715923555 · Zbl 05470543 [9] DOI: 10.1007/s11634-008-0020-9 · Zbl 1151.90551 [10] Burges, Advances in Neural Information Processing Systems 9 pp 375– (1997) [11] DOI: 10.1007/978-3-540-70981-7_18 [12] DOI: 10.1080/10556780512331318182 · Zbl 1072.90026 [13] Bottou, Large Scale Kernel Machines pp 301– (2007) [14] Graf, Advances in Neural Information Processing Systems 17 pp 521– (2005) [15] DOI: 10.1080/10556780500140714 · Zbl 1116.90115 [16] DOI: 10.1145/130385.130401 [17] DOI: 10.1007/s10107-003-0377-7 · Zbl 1055.90090 [18] Schölkopf, Learning with Kernels (2002) [19] Bordes, J. Mach. Learn. Res. 6 pp 1579– (2005) [20] Scheinberg, J. Mach. Learn. Res. 7 pp 2237– (2006) [21] Bennett, J. Mach. Learn. Res. 7 pp 1265– (2006) [22] DOI: 10.1145/1390156.1390197 [23] Prato, Applied Computational Electromagnetics Society Journal 22 pp 1939– (2007) [24] Agarwal, J. Mach. Learn. Res. 10 pp 441– (2009) [25] Platt, Advances in Neural Information Processing Systems 11 pp 557– (1999) [26] Platt, Advances in Kernel Methods: Support Vector Learning pp 185– (1998) [27] Fletcher, Practical Methods of Optimization (1987) · Zbl 0905.65002 [28] DOI: 10.1109/CVPR.1997.609310 [29] DOI: 10.1162/15324430152748218 · Zbl 0997.68108 [30] Mangasarian, J. Mach. Learn. Res. 7 pp 1517– (2006) [31] DOI: 10.1162/15324430260185619 · Zbl 1037.68112 [32] DOI: 10.1137/S1052623400374379 · Zbl 1039.90092 [33] DOI: 10.1080/1055678021000028375 · Zbl 1065.90078 [34] Fan, J. Mach. Learn. Res. 9 pp 1871– (2008) [35] Mangasarian, Advances in Large Margin Classifiers pp 135– (2000) [36] DOI: 10.1023/A:1018946025316 · Zbl 0939.68098 [37] DOI: 10.1109/72.977319 [38] Durdanovic, Large Scale Kernel Machines pp 105– (2007) [39] DOI: 10.1109/72.963765 [40] Drineas, J. Mach. Learn. Res. 6 pp 2153– (2005) [41] DOI: 10.1109/TPAMI.2005.77 · Zbl 05111105 [42] DOI: 10.1023/A:1011215321374 · Zbl 1017.90105 [43] Dong, Proc. 3rd International Conference on Machine Learning and Data Mining 2734 pp 96– (2003) [44] Lee, Proc. 1st SIAM International Conference on Data Mining, Chicago, April 5–7, 2001 pp 1– (2001) [45] De Vito, J. Mach. Learn. Res. 5 pp 1363– (2004) [46] De Vito, J. Mach. Learn. Res. 6 pp 883– (2005) [47] Zanni, J. Mach. Learn. Res. 7 pp 1467– (2006) [48] DOI: 10.1145/1143844.1143908 [49] DOI: 10.1007/s10287-005-0004-6 · Zbl 1163.90693 [50] DOI: 10.1017/CBO9780511618796 · Zbl 1274.41001 [51] DOI: 10.1007/s102080010030 · Zbl 1057.68085 [52] DOI: 10.1162/089976601300014493 · Zbl 1085.68629 [53] DOI: 10.1090/S0273-0979-01-00923-5 · Zbl 0983.68162 [54] Keerthi, J. Mach. Learn. Res. 7 pp 1493– (2006) [55] Cristianini, An Introduction to Support Vector Machines and other Kernel-Based Learning Methods (2000) · Zbl 0994.68074 [56] Woodsend, J. Mach. Learn. Res. 20 pp 1937– (2009) [57] DOI: 10.1023/A:1012431217818 · Zbl 0998.68109 [58] DOI: 10.1162/15324430152733142 · Zbl 1052.68111 [59] Keerthi, J. Mach. Learn. Res. 6 pp 341– (2005) [60] DOI: 10.1109/TNN.2006.875973 [61] DOI: 10.1145/1150402.1150429 [62] DOI: 10.1162/neco.2007.19.5.1155 · Zbl 1123.68101 [63] Williams, Advances in Neural Information Processing Systems 13 pp 682– (2001) [64] Joachims, Advances in Kernel Methods: Support Vector Learning pp 169– (1999) [65] Chang, Advances in Neural Information Processing Systems 20 pp 257– (2008) [66] Vapnik, The Nature of Statistical Learning Theory (1999) · Zbl 0934.62009 [67] Hush, J. Mach. Learn. Res. 7 pp 733– (2006) [68] DOI: 10.1109/72.857780
This reference list is based on information provided by the publisher or from digital mathematics libraries. Its items are heuristically matched to zbMATH identifiers and may contain data conversion errors. It attempts to reflect the references listed in the original paper as accurately as possible without claiming the completeness or perfect precision of the matching.