×

Proximal gradient method for huberized support vector machine. (English) Zbl 1425.68358

Summary: The support vector machine (SVM) has been used in a wide variety of classification problems. The original SVM uses the hinge loss function, which is non-differentiable and makes the problem difficult to solve in particular for regularized SVMs, such as with \(\ell _1\)-regularization. This paper considers the Huberized SVM (HSVM), which uses a differentiable approximation of the hinge loss function. We first explore the use of the proximal gradient (PG) method to solving binary-class HSVM (B-HSVM) and then generalize it to multi-class HSVM (M-HSVM). Under strong convexity assumptions, we show that our algorithm converges linearly. In addition, we give a finite convergence result about the support of the solution, based on which we further accelerate the algorithm by a two-stage method. We present extensive numerical experiments on both synthetic and real datasets which demonstrate the superiority of our methods over some state-of-the-art methods for both binary- and multi-class SVMs.

MSC:

68T05 Learning and adaptive systems in artificial intelligence
62H30 Classification and discrimination; cluster analysis (statistical aspects)

Software:

SeDuMi; CVX; LIBLINEAR
PDFBibTeX XMLCite
Full Text: DOI arXiv

References:

[1] Beck A, Teboulle M (2009) A fast iterative shrinkage-thresholding algorithm for linear inverse problems. SIAM J Imaging Sci 2(1):183-202 · Zbl 1175.94009 · doi:10.1137/080716542
[2] Bolte J, Daniilidis A, Lewis A (2007) The lojasiewicz inequality for nonsmooth subanalytic functions with applications to subgradient dynamical systems. SIAM J Optim 17(4):1205-1223 · Zbl 1129.26012 · doi:10.1137/050644641
[3] Bottou L, Cortes C, Denker JS, Drucker H, Guyon I, Jackel LD, LeCun Y, Muller UA, Sackinger E, Simard P et al (1994) Comparison of classifier methods: a case study in handwritten digit recognition. In: Proceedings of the 12th IAPR International Conference on Pattern Recognition, vol 2, pp 77-82. IEEE
[4] Boyd S, Vandenberghe L (2004) Convex optimization. Cambridge University Press, Cambridge · Zbl 1058.90049 · doi:10.1017/CBO9780511804441
[5] Brown MPS, Grundy WN, Lin D, Cristianini N, Sugnet CW, Furey TS, Ares M, Haussler D (2000) Knowledge-based analysis of microarray gene expression data by using support vector machines. Proc Nat Acad Sci 97(1):262 · doi:10.1073/pnas.97.1.262
[6] Chen Zhen-Yu, Fan Zhi-Ping, Sun Minghe (2012) A hierarchical multiple kernel support vector machine for customer churn prediction using longitudinal behavioral data. Eur J Oper Res 223(2):461-472 · Zbl 1292.68131 · doi:10.1016/j.ejor.2012.06.040
[7] Cortes C, Vapnik V (1995) Support-vector networks. Mach Learn 20(3):273-297 · Zbl 0831.68098
[8] Czarnecki WM, Tabor J (2014) Two ellipsoid support vector machines. Exp Syst Appl 41:8211-8224 · doi:10.1016/j.eswa.2014.07.015
[9] Demšar Janez (2006) Statistical comparisons of classifiers over multiple data sets. J Mach Learn Res 7:1-30 · Zbl 1222.68184
[10] Dudoit S, Fridlyand J, Speed TP (2002) Comparison of discrimination methods for the classification of tumors using gene expression data. J Am Stat Assoc 97(457):77-87 · Zbl 1073.62576 · doi:10.1198/016214502753479248
[11] Fan Rong-En, Chang Kai-Wei, Hsieh Cho-Jui, Wang Xiang-Rui, Lin Chih-Jen (2008) Liblinear: a library for large linear classification. J Mach Learn Res 9:1871-1874 · Zbl 1225.68175
[12] Friedman Milton (1937) The use of ranks to avoid the assumption of normality implicit in the analysis of variance. J Am Stat Assoc 32(200):675-701 · JFM 63.1098.02 · doi:10.1080/01621459.1937.10503522
[13] Friedman Milton (1940) A comparison of alternative tests of significance for the problem of m rankings. Ann Math Stat 11(1):86-92 · JFM 66.1305.08 · doi:10.1214/aoms/1177731944
[14] Galar M, Fernandez A, Barrenechea E, Herrera F (2015) Drcw-ovo: Distance-based relative competence weighting combination for one-vs-one strategy in multiclass problems. Pattern Recognit 48:28-42 · doi:10.1016/j.patcog.2014.07.023
[15] Galar Mikel, Fernández Alberto, Barrenechea Edurne, Bustince Humberto, Herrera Francisco (2011) An overview of ensemble methods for binary classifiers in multi-class problems: Experimental study on one-vs-one and one-vs-all schemes. Pattern Recognit 44(8):1761-1776 · doi:10.1016/j.patcog.2011.01.017
[16] Golub TR, Slonim DK, Tamayo P, Huard C, Gaasenbeek M, Mesirov JP, Coller H, Loh ML, Downing JR, Caligiuri MA et al (1999) Molecular classification of cancer: class discovery and class prediction by gene expression monitoring. Science 286(5439):531-537 · doi:10.1126/science.286.5439.531
[17] Hale ET, Yin W, Zhang Y (2008) Fixed-point continuation for \[l_1\] l1-minimization: methodology and convergence. SIAM J Optim 19(3):1107-1130 · Zbl 1180.65076 · doi:10.1137/070698920
[18] Heisele B, Ho P, Poggio T (2001) Face recognition with support vector machines: Global versus component-based approach. In: Computer Vision, 2001. ICCV 2001. Proceedings. Eighth IEEE International Conference on, vol 2, pp 688-694. IEEE
[19] Holm S (1979) A simple sequentially rejective multiple test procedure. Scand J Stat 65-70 · Zbl 0402.62058
[20] Hsu CW, Lin CJ (2002) A comparison of methods for multiclass support vector machines. Neural Netw IEEE Trans 13(2):415-425 · doi:10.1109/72.991427
[21] CVX Research, Inc. CVX: Matlab software for disciplined convex programming, version 2.0 beta (2012). http://cvxr.com/cvx, Sept 2012
[22] Keerthi SS, DeCoste D (2006) A modified finite newton method for fast solution of large scale linear svms. J Mach Learn Res 6(1):341 · Zbl 1222.68231
[23] Khan J, Wei JS, Ringnér M, Saal LH, Ladanyi M, Westermann F, Berthold F, Schwab M, Antonescu CR, Peterson C et al (2001) Classification and diagnostic prediction of cancers using gene expression profiling and artificial neural networks. Nat Med 7(6):673-679 · doi:10.1038/89044
[24] Kim KI, Jung K, Park SH, Kim HJ (2002) Support vector machines for texture classification. Pattern Anal Mach Intell IEEE Trans 24(11):1542-1550 · doi:10.1109/TPAMI.2002.1046177
[25] Knerr S, Personnaz L, Dreyfus G (1990) Single-layer learning revisited: a stepwise procedure for building and training a neural network. In: Neurocomputing, pp 41-50. Springer
[26] Krawcczyk B, Wozniak M, Cyganek B (2014) Clustering-based ensembles for one-class classification. Inf Sci 264:182-195 · Zbl 1335.68205 · doi:10.1016/j.ins.2013.12.019
[27] Kurdyka K (1998) On gradients of functions definable in o-minimal structures. In: Annales de l’institut Fourier, vol 48, pp 769-784. Chartres: L’Institut, 1950 · Zbl 0934.32009
[28] Lacoste-Julien S, Jaggi M, Schmidt M, Pletscher P (2012) Block-coordinate frank-wolfe optimization for structural svms. arXiv preprint arXiv:1207.4747
[29] Lee Y, Lin Y, Wahba G (2004) Multicategory support vector machines. J Am Stat Assoc 99(465):67-81 · Zbl 1089.62511 · doi:10.1198/016214504000000098
[30] Li J, Jia Y (2010) Huberized multiclass support vector machine for microarray classification. Acta Autom Sin 36(3):399-405 · Zbl 1240.68189
[31] Li Qi, Salman Raied, Test Erik, Strack Robert, Kecman Vojislav (2013) Parallel multitask cross validation for support vector machine using gpu. J Parallel Distrib Comput 73(3):293-302 · doi:10.1016/j.jpdc.2012.02.011
[32] Łojasiewicz S (1993) Sur la géométrie semi-et sous-analytique. Ann Inst Fourier (Grenoble) 43(5):1575-1595 · Zbl 0803.32002 · doi:10.5802/aif.1384
[33] Nesterov Y (2004) Introductory lectures on convex optimization vol 87. A basic course · Zbl 1086.90045
[34] Nesterov Y (2007) Gradient methods for minimizing composite objective function. CORE Discussion Papers
[35] Ouyang H, He N, Tran L, Gray A (2013) Stochastic alternating direction method of multipliers. In: Proceedings of the 30th International Conference on Machine Learning, pp 80-88
[36] Platt JC, Cristianini N, Shawe-Taylor J (2000) Large margin dags for multiclass classification. Adv Neural Inf Process Syst 12(3):547-553
[37] Qi Zhiquan, Tian Yingjie, Shi Yong (2013) Structural twin support vector machine for classification. Knowl-Based Syst 43:74-81 · Zbl 1248.68441 · doi:10.1016/j.knosys.2013.01.008
[38] Schmidt M, Roux NL, Bach F (2011) Convergence rates of inexact proximal-gradient methods for convex optimization. Arxiv preprint arXiv:1109.2415
[39] Sturm JF (1999) Using SeDuMi 1.02, a MATLAB toolbox for optimization over symmetric cones. Optim Methods Softw 11(1-4):625-653 · Zbl 0973.90526 · doi:10.1080/10556789908805766
[40] Tibshirani R (1996) Regression shrinkage and selection via the lasso. J R Stat Soc Ser B 58(1):267-288 · Zbl 0850.62538
[41] Tsyurmasto P, Zabarankin M, Uryasev S (2014) Value-at-risk support vector machine: stability to outliers. J Comb Optim 1-15 · Zbl 1304.90181
[42] Wang L, Shen X (2007) On \[{L}_1\] L1-norm multiclass support vector machines. J Am Stat Assoc 102(478):583-594 · Zbl 1172.62317 · doi:10.1198/016214506000001383
[43] Wang L, Zhu J, Zou H (2006) The doubly regularized support vector machine. Stat Sin 16(2):589 · Zbl 1126.68070
[44] Wang L, Zhu J, Zou H (2008) Hybrid huberized support vector machines for microarray classification and gene selection. Bioinformatics 24(3):412-419 · doi:10.1093/bioinformatics/btm579
[45] Wilcoxon F (1945) Individual comparisons by ranking methods. Biom Bull pp 80-83
[46] Xu Y, Yin W (2013) A block coordinate descent method for regularized multiconvex optimization with applications to nonnegative tensor factorization and completion. SIAM J Imaging Sci 6(3):1758-1789 · Zbl 1280.49042 · doi:10.1137/120887795
[47] Xu Y, Yin W (2014) A globally convergent algorithm for nonconvex optimization based on block coordinate update. Arxiv preprint arXiv:1410.1386 · Zbl 1378.65126
[48] Yang Yi, Zou Hui (2013) An efficient algorithm for computing the hhsvm and its generalizations. J Comput Graph Stat 22(2):396-415 · doi:10.1080/10618600.2012.680324
[49] Ye GB, Chen Y, Xie X (2011) Efficient variable selection in support vector machines via the alternating direction method of multipliers. In: Proceedings of the International conference on artificial intelligence and statistics
[50] Zhang H, Liu Y, Wu Y, Zhu J (2008) Variable selection for the multicategory svm via adaptive sup-norm regularization. Electron J Stat 2:149-167 · Zbl 1135.62056 · doi:10.1214/08-EJS122
[51] Zhang Yang, Meratnia Nirvana, Havinga Paul JM (2013) Distributed online outlier detection in wireless sensor networks using ellipsoidal support vector machine. Ad Hoc Netw 11(3):1062-1074 · doi:10.1016/j.adhoc.2012.11.001
[52] Zou J, Hastie T (2005) Regularization and variable selection via the elastic net. J R Stat Soc Ser B 67(1):301-320 · Zbl 1069.62054 · doi:10.1111/j.1467-9868.2005.00503.x
This reference list is based on information provided by the publisher or from digital mathematics libraries. Its items are heuristically matched to zbMATH identifiers and may contain data conversion errors. In some cases that data have been complemented/enhanced by data from zbMATH Open. This attempts to reflect the references listed in the original paper as accurately as possible without claiming completeness or a perfect matching.