×

Robust multicategory support matrix machines. (English) Zbl 1415.90080

Summary: We consider the classification problem when the input features are represented as matrices rather than vectors. To preserve the intrinsic structures for classification, a successful method is the support matrix machine (SMM) in [L. Luo et al., “Support matrix machines”, in: Proceedings of the 32nd international conference on machine learning. PMLR 37, 938–947 (2015)], which optimizes an objective function with a hinge loss plus a so-called spectral elastic net penalty. However, the issues of extending SMM to multicategory classification still remain. Moreover, in practice, it is common to see the training data contaminated by outlying observations, which can affect the robustness of existing matrix classification methods. In this paper, we address these issues by introducing a robust angle-based classifier, which boils down binary and multicategory problems to a unified framework. Benefitting from the use of truncated hinge loss functions, the proposed classifier achieves certain robustness to outliers. The underlying optimization model becomes nonconvex, but admits a natural DC (difference of two convex functions) representation. We develop a new and efficient algorithm by incorporating the DC algorithm and primal-dual first-order methods together. The proposed DC algorithm adaptively chooses the accuracy of the subproblem at each iteration while guaranteeing the overall convergence of the algorithm. The use of primal-dual methods removes a natural complexity of the linear operator in the subproblems and enables us to use the proximal operator of the objective functions, and matrix-vector operations. This advantage allows us to solve large-scale problems efficiently. Theoretical and numerical results indicate that for problems with potential outliers, our method can be highly competitive among existing methods.

MSC:

90C25 Convex programming
90C26 Nonconvex programming, global optimization
62H30 Classification and discrimination; cluster analysis (statistical aspects)
PDFBibTeX XMLCite
Full Text: DOI Link

References:

[1] Altun, K., Barshan, B.: Human activity recognition using inertial/magnetic sensor units. In: International Workshop on Human Behavior Understanding, pp. 38-51. Springer (2010)
[2] An, L.T.H., Tao, P.D.: Solving a class of linearly constrained indefinite quadratic problems by DC algorithms. J. Glob. Optim. 11(3), 253-285 (1997) · Zbl 0905.90131
[3] Bauschke, H., Combettes, P.L.: Convex Analysis and Monotone Operators Theory in Hilbert Spaces, 2nd edn. Springer, Berlin (2017) · Zbl 1359.26003
[4] Boser, B.E., Guyon, I.M., Vapnik, V.N.: A training algorithm for optimal margin classifiers. In: Proceedings of the Fifth Annual Workshop on Computational Learning Theory, pp. 144-152. ACM (1992)
[5] Boyd, S.: Alternating direction method of multipliers. In: Talk at NIPS Workshop on Optimization and Machine Learning (2011)
[6] Cai, D., He, X., Wen, J.-R., Han, J., Ma, W.-Y.: Support tensor machines for text categorization. Technical Report (2006)
[7] Cai, J.-F., Candès, E.J., Shen, Z.: A singular value thresholding algorithm for matrix completion. SIAM J. Optim. 20(4), 1956-1982 (2010) · Zbl 1201.90155
[8] Chambolle, A., Pock, T.: A first-order primal-dual algorithm for convex problems with applications to imaging. J. Math. Imaging Vis. 40(1), 120-145 (2011) · Zbl 1255.68217
[9] Chambolle, A., Pock, T.: On the ergodic convergence rates of a first-order primal-dual algorithm. Math. Program. 159(1-2), 253-287 (2016) · Zbl 1350.49035
[10] Cortes, C., Vapnik, V.: Support-vector networks. Mach. Learn. 20(3), 273-297 (1995) · Zbl 0831.68098
[11] Davis, D., Yin, W.: Faster convergence rates of relaxed Peaceman-Rachford and ADMM under regularity assumptions. Math. Oper. Res. 42, 783-805 (2014) · Zbl 1379.65035
[12] Friedman, J., Hastie, T., Tibshirani, R.: The Elements of Statistical Learning. Springer Series in Statistics, 2nd edn. Springer, New York (2001) · Zbl 0973.62007
[13] He, B.S., Yuan, X.M.: On the \[{O}(1/n)O\](1/n) convergence rate of the Douglas-Rachford alternating direction method. SIAM J. Numer. Anal. 50, 700-709 (2012) · Zbl 1245.90084
[14] Hou, C., Nie, F., Zhang, C., Yi, D., Wu, Y.: Multiple rank multi-linear SVM for matrix data classification. Pattern Recognit. 47(1), 454-469 (2014) · Zbl 1326.68245
[15] Huber, P.J., Ronchetti, E.: Robust Statistics. Wiley Series in Probability and Statistics, 2nd edn. Wiley, Hoboken (2009) · Zbl 1276.62022
[16] Latala, R.: Some estimates of norms of random matrices. Proc. Am. Math. Soc. 133(5), 1273-1282 (2005) · Zbl 1067.15022
[17] Lee, Y., Lin, Y., Wahba, G.: Multicategory support vector machines: theory and application to the classification of microarray data and satellite radiance data. J. Am. Stat. Assoc. 99(465), 67-81 (2004) · Zbl 1089.62511
[18] Liu, Y.: Fisher consistency of multicategory support vector machines. In: Artificial Intelligence and Statistics, pp. 291-298 (2007)
[19] Luo, L., Xie, Y., Zhang, Z., Li, W.-J.: Support matrix machines. In: Proceedings of the 32nd International Conference on Machine Learning, Lille, France, no. 1, pp. 938-947 (2015)
[20] Mohri, M., Rostamizadeh, A., Talwalkar, A.: Foundations of Machine Learning. Adaptive Computation and Machine Learning Series. MIT Press, Cambridge (2012) · Zbl 1318.68003
[21] Monteiro, R.D.C., Svaiter, B.F.: Iteration-complexity of block-decomposition algorithms and the alternating direction method of multipliers. SIAM J. Optim. 23(1), 475-507 (2013) · Zbl 1267.90181
[22] Negahban, S., Wainwright, M.J.: Estimation of (near) low-rank matrices with noise and high-dimensional scaling. Ann. Stat. 39(2), 1069-1097 (2011) · Zbl 1216.62090
[23] Nesterov, Y.: Introductory Lectures on Convex Optimization: A Basic Course, volume 87 of Applied Optimization. Kluwer Academic Publishers, Boston (2004) · Zbl 1086.90045
[24] Pirsiavash, H., Ramanan, D., Fowlkes, C.: Bilinear classifiers for visual recognition. In: Advances in Neural Information Processing Systems, pp. 1482-1490 (2009)
[25] Rockafellar, R.T.: Convex Analysis, volume 28 of Princeton Mathematics Series. Princeton University Press, Princeton (1970)
[26] Sun, H., Craig, B., Zhang, L.: Angle-based multicategory distance-weighted SVM. J. Mach. Learn. Res. 18(85), 1-21 (2017) · Zbl 1440.62251
[27] Tao, D., Li, X., Wu, X., Hu, W., Maybank, S.J.: Supervised tensor learning. Knowl. Inf. Syst. 13(1), 1-42 (2007)
[28] Tran-Dinh, Q.: Proximal alternating penalty algorithms for constrained convex optimization. Comput. Optim. Appl. 72(1), 1-43 (2019) · Zbl 1418.90199
[29] Wright, S.J.: Coordinate descent algorithms. Math. Program. 151(1), 3-34 (2015) · Zbl 1317.49038
[30] Wu, Y., Liu, Y.: Robust truncated hinge loss support vector machines. J. Am. Stat. Assoc. 102(479), 974-983 (2007) · Zbl 1469.62293
[31] Yang, J., Zhang, D., Frangi, A.F., Yang, J.-Y.: Two-dimensional PCA: a new approach to appearance-based face representation and recognition. IEEE Trans. Pattern Anal. Mach. Intell. 26(1), 131-137 (2004)
[32] Zhang, C., Liu, Y.: Multicategory angle-based large-margin classification. Biometrika 101(3), 625-640 (2014) · Zbl 1335.62110
[33] Zhang, C., Liu, Y., Wang, J., Zhu, H.: Reinforced angle-based multicategory support vector machines. J. Comput. Gr. Stat. 25(3), 806-825 (2016)
[34] Zhang, C., Pham, M., Fu, S., Liu, Y.: Robust multicategory support vector machines using difference convex algorithm. Math. Program. 169(1), 277-305 (2018) · Zbl 1397.90319
[35] Zhao, J., Yu, G., Liu, Y., et al.: Assessing robustness of classification using an angular breakdown point. Ann. Stat. 46(6B), 3362-3389 (2018) · Zbl 1408.62121
[36] Zhou, H., Li, L.: Regularized matrix regression. J. R. Stat. Soc. Ser. B (Stat. Methodol.) 76(2), 463-483 (2014) · Zbl 07555458
[37] Zou, H., Hastie, T.: Regularization and variable selection via the elastic net. J. R. Stat. Soc. Ser. B (Stat. Methodol.) 67(2), 301-320 (2005) · Zbl 1069.62054
This reference list is based on information provided by the publisher or from digital mathematics libraries. Its items are heuristically matched to zbMATH identifiers and may contain data conversion errors. In some cases that data have been complemented/enhanced by data from zbMATH Open. This attempts to reflect the references listed in the original paper as accurately as possible without claiming completeness or a perfect matching.