On support vector machines under a multiple-cost scenario. (English) Zbl 1474.62211

Summary: Support vector machine (SVM) is a powerful tool in binary classification, known to attain excellent misclassification rates. On the other hand, many realworld classification problems, such as those found in medical diagnosis, churn or fraud prediction, involve misclassification costs which may be different in the different classes. However, it may be hard for the user to provide precise values for such misclassification costs, whereas it may be much easier to identify acceptable misclassification rates values. In this paper we propose a novel SVM model in which misclassification costs are considered by incorporating performance constraints in the problem formulation. Specifically, our aim is to seek the hyperplane with maximal margin yielding misclassification rates below given threshold values. Such maximal margin hyperplane is obtained by solving a quadratic convex problem with linear constraints and integer variables. The reported numerical experience shows that our model gives the user control on the misclassification rates in one class (possibly at the expense of an increase in misclassification rates for the other class) and is feasible in terms of running times.


62H30 Classification and discrimination; cluster analysis (statistical aspects)
62P20 Applications of statistics to economics
90C11 Mixed integer programming
90C30 Nonlinear programming
Full Text: DOI


[1] Alcalá-Fdez, J.; Sanchez, L.; Garcia, S.; Jesus, MJ; Ventura, S.; Garrell, JM; Otero, J.; Romero, C.; Bacardit, J.; Rivas, VM; etal., Keel: a software tool to assess evolutionary algorithms for data mining problems, Soft Comput, 13, 307-318, (2009)
[2] Allwein, EL; Schapire, RE; Singer, Y., Reducing multiclass to binary: a unifying approach for margin classifiers, J Mach Learn Res, 1, 113-141, (2001) · Zbl 1013.68175
[3] Benítez-Peña, S.; Blanquero, R.; Carrizosa, E.; Ramírez-Cobo, P., Cost-sensitive feature selection for support vector machines, Comput Oper Res, (2018) · Zbl 1458.68158
[4] Bertsimas, D.; King, A.; Mazumder, R.; etal., Best subset selection via a modern optimization lens, Ann Stat, 44, 813-852, (2016) · Zbl 1335.62115
[5] Bertsimas D, Weismantel R (2005) Optimization over integers. Dynamic Ideas, Belmont
[6] Bewick, V.; Cheek, L.; Ball, J., Statistics review 13: receiver operating characteristic curves, Crit Care, 8, 508-512, (2004)
[7] Bonami, P.; Biegler, LT; Conn, AR; Cornujols, G.; Grossmann, IE; Laird, CD; Lee, J.; Lodi, A.; Margot, F.; Sawaya, N.; Wchter, A., An algorithmic framework for convex mixed integer nonlinear programs, Discrete Optim, 5, 186-204, (2008) · Zbl 1151.90028
[8] Bradford JP, Kunz C, Kohavi R, Brunk C, Brodley CE (1998) Pruning decision trees with misclassification costs. In: Proceedings of the 10th European conference on machine learning, ECML’98. Springer, pp. 131-136
[9] Burer, S.; Letchford, AN, Non-convex mixed-integer nonlinear programming: a survey, Surv Oper Res Manag Sci, 17, 97-106, (2012)
[10] Burges, CJ, A tutorial on support vector machines for pattern recognition, Data Min Knowl Discov, 2, 121-167, (1998)
[11] Camm, JD; Raturi, AS; Tsubakitani, S., Cutting big M down to size, Interfaces, 20, 61-66, (1990)
[12] Carrizosa, E.; Martin-Barragan, B.; Romero Morales, D., Multi-group support vector machines with measurement costs: a biobjective approach, Discrete Appl Math, 156, 950-966, (2008) · Zbl 1152.90536
[13] Carrizosa, E.; Romero Morales, D., Supervised classification and mathematical optimization, Comput Oper Res, 40, 150-165, (2013) · Zbl 1349.68135
[14] Cortes, C.; Vapnik, V., Support-vector networks, Mach Learn, 20, 273-297, (1995) · Zbl 0831.68098
[15] Cristianini N, Shawe-Taylor J (2000) An introduction to support vector machines and other kernel-based learning methods. Cambridge University Press, New York · Zbl 0994.68074
[16] Datta, S.; Das, S., Near-Bayesian support vector machines for imbalanced data classification with equal or unequal misclassification costs, Neural Netw, 70, 39-52, (2015) · Zbl 1394.68280
[17] Freitas A, Costa-Pereira A, Brazdil P (2007) Cost-sensitive decision trees applied to medical data. In: Data warehousing and knowledge discovery: 9th international conference, DaWaK 2007, Regensburg Germany, September 3-7, 2007. Proceedings. Springer, Berlin, pp 303-312
[18] Guo, J., Simultaneous variable selection and class fusion for high-dimensional linear discriminant analysis, Biostatistics, 11, 599-608, (2010)
[19] Gurobi Optimization I (2016) Gurobi optimizer reference manual. http://www.gurobi.com
[20] Hastie T, Tibshirani R, Friedman J (2001) The elements of statistical learning. Springer series in statistics. Springer, New York · Zbl 0973.62007
[21] He H, Ma Y (2013) Imbalanced learning: foundations, algorithms, and applications. Wiley, Hoboken · Zbl 1272.68022
[22] Hoeffding, W., Probability inequalities for sums of bounded random variables, J Am Stat Assoc, 58, 13-30, (1963) · Zbl 0127.10602
[23] Horn D, Demircioğlu A, Bischl B, Glasmachers T, Weihs C (2016) A comparative study on large scale kernelized support vector machines. Adv Data Anal Classif. https://doi.org/10.1007/s11634-016-0265-7
[24] Hsu CW, Chang CC, Lin CJ et al (2003) A practical guide to support vector classification. Tech. rep., Department of Computer Science, National Taiwan University
[25] Kohavi R et al (1995) A study of cross-validation and bootstrap for accuracy estimation and model selection. In: IJCAI, vol 14. Stanford, CA, pp 1137-1145
[26] Lichman M (2013) UCI machine learning repository. https://archive.ics.uci.edu/ml/index.php
[27] Lin, Y.; Lee, Y.; Wahba, G., Support vector machines for classification in nonstandard situations, Mach Learn, 46, 191-202, (2002) · Zbl 0998.68103
[28] Lyu, T.; Lock, EF; Eberly, LE, Discriminating sample groups with multi-way data, Biostatistics, 18, 434-450, (2017)
[29] Maldonado, S.; Prez, J.; Bravo, C., Cost-based feature selection for support vector machines: an application in credit scoring, Eur J Oper Res, 261, 656-665, (2017) · Zbl 1403.90526
[30] Mansouri, K.; Ringsted, T.; Ballabio, D.; Todeschini, R.; Consonni, V., Quantitative structureactivity relationship models for ready biodegradability of chemicals, J Chem Inf Model, 53, 867-878, (2013)
[31] Mercer, J., Functions of positive and negative type, and their connection with the theory of integral equations, Philos Trans R Soc Lond Ser A, 209, 415-446, (1909) · JFM 40.0408.02
[32] Prati, RC; Batista, GE; Silva, DF, Class imbalance revisited: a new experimental setup to assess the performance of treatment methods, Knowl Inf Syst, 45, 247-270, (2015)
[33] Sánchez, BN; Wu, M.; Song, PXK; Wang, W., Study design in high-dimensional classification analysis, Biostatistics, 17, 722, (2016)
[34] Silva, APD, Optimization approaches to supervised classification, Eur J Oper Res, 261, 772-788, (2017) · Zbl 1403.62114
[35] Smola, AJ; Schölkopf, B., A tutorial on support vector regression, Stat Comput, 14, 199-222, (2004)
[36] Tang, Y.; Zhang, YQ; Chawla, NV; Krasser, S., SVMs modeling for highly imbalanced classification, IEEE Trans Syst Man Cybern B (Cybern), 39, 281-288, (2009)
[37] Van Rossum G, Drake FL (2011) An Introduction to Python. Network Theory Ltd, United Kingdom
[38] Vapnik VN (1995) The nature of statistical learning theory. Springer, New York · Zbl 0833.62008
[39] Vapnik VN (1998) Statistical learning theory, vol 1, 1st edn. Wiley, New York · Zbl 0935.62007
[40] Yao, Y.; Lee, Y., Another look at linear programming for feature selection via methods of regularization, Stat Comput, 24, 885-905, (2014) · Zbl 1322.90047
This reference list is based on information provided by the publisher or from digital mathematics libraries. Its items are heuristically matched to zbMATH identifiers and may contain data conversion errors. It attempts to reflect the references listed in the original paper as accurately as possible without claiming the completeness or perfect precision of the matching.