High dimensional data classification and feature selection using support vector machines. (English) Zbl 1381.62170

Summary: In many big-data systems, large amounts of information are recorded and stored for analytics purposes. Often however, this vast amount of information does not offer additional benefits for optimal decision making, but may rather be complicating and too costly for collection, storage, and processing. For instance, tumor classification using high-throughput microarray data is challenging due to the presence of a large number of noisy features that do not contribute to the reduction of classification errors. For such problems, the general aim is to find a limited number of genes that highly differentiate among the classes. Thus in this paper, we address a specific class of machine learning, namely the problem of feature selection within support vector machine classification that deals with finding an accurate binary classifier that uses a minimal number of features. We introduce a new approach based on iteratively adjusting a bound on the l1-norm of the classifier vector in order to force the number of selected features to converge towards the desired maximum limit. We analyze two real-life classification problems with high dimensional features. The first case is the medical diagnosis of tumors based on microarray data where we present a generic approach for cancer classification based on gene expression. The second case deals with sentiment classification of on-line reviews from Amazon, Yelp, and IMDb. The results show that the proposed classification and feature selection approach is simple, computationally tractable, and achieves low error rates which are key for the construction of advanced decision-support systems.


62H30 Classification and discrimination; cluster analysis (statistical aspects)
68T05 Learning and adaptive systems in artificial intelligence
90C20 Quadratic programming
90C90 Applications of mathematical programming


Full Text: DOI


[1] Aytug, H., Feature selection for support vector machines using generalized benders decomposition, European Journal of Operational Research, 244, 1, 210-218, (2015) · Zbl 1347.62105
[2] Baesens, B.; Setiono, R.; Mues, C.; Vanthienen, J., Using neural network rule extraction and decision tables for credit-risk evaluation, Management Science, 49, 3, 312-329, (2003) · Zbl 1232.91684
[3] Bertolazzi, P.; Felici, G.; Festa, P.; Fiscon, G.; Weitschek, E., Integer programming models for feature selection: new extensions and a randomized solution algorithm, European Journal of Operational Research, 250, 2, 389-399, (2016) · Zbl 1346.90605
[4] Bradley, P.; Mangasarian, O., Massive data discrimination via linear support vector machines, Optimization Methods and Software, 13, 1, 1-10, (2000) · Zbl 0986.90085
[5] Chan, A. B.; Vasconcelos, N.; Lanckriet, G. R., Direct convex relaxations of sparse SVM, Proceedings of the twenty-fourth international conference on machine learning, 145-153, (2007)
[6] Cortes, C.; Vapnik, V., Support-vector networks, Machine Learning, 20, 3, 273-297, (1995) · Zbl 0831.68098
[7] Cui, G.; Wong, M. L.; Lui, H.-K., Machine learning for direct marketing response models: Bayesian networks with evolutionary programming, Management Science, 52, 4, 597-612, (2006) · Zbl 1232.90232
[8] Das, S. R.; Chen, M. Y., Yahoo! for amazon: sentiment extraction from small talk on the web, Management Science, 53, 9, 1375-1388, (2007)
[9] Downs, T.; Gates, K. E.; Masters, A., Exact simplification of support vector solutions, The Journal of Machine Learning Research, 2, 293-297, (2002) · Zbl 1037.68111
[10] Dunbar, M.; Murray, J. M.; Cysique, L. A.; Brew, B. J.; Jeyakumar, V., Simultaneous classification and feature selection via convex quadratic programming with application to HIV-associated neurocognitive disorder assessment, European Journal of Operational Research, 206, 2, 470-478, (2010) · Zbl 1188.90235
[11] Ferris, M. C.; Munson, T. S., Interior-point methods for massive support vector machines, SIAM Journal on Optimization, 13, 3, 783-804, (2002) · Zbl 1039.90092
[12] Friedman, L. (2015). IBM’s Watson computer can now do in a matter of minutes what it takes cancer doctors weeks to perform. http://www.businessinsider.com/r-ibms-watson-to-guide-cancer-therapies-at-14-centers-2015-5. Accessed: August 15, 2016.
[13] Fung, G. M.; Mangasarian, O. L., A feature selection Newton method for support vector machine classification, Computational Optimization and Applications, 28, 2, 185-202, (2004) · Zbl 1056.90103
[14] Furlow, B. (2016). IBM Watson collaboration aims to improve oncology decision support tools. https://www.cancernetwork.com/mbcc-2016/ibm-watson-collaboration-aims-improve-oncology-decision-support-tools. Accessed August 15, 2016.
[15] Golub, T. R.; Slonim, D. K.; Tamayo, P.; Huard, C.; Gaasenbeek, M.; Mesirov, J. P., Molecular classification of cancer: class discovery and class prediction by gene expression monitoring, Science, 286, 5439, 531-537, (1999)
[16] Guyon, I.; Elisseeff, A., An introduction to variable and feature selection, Journal of machine learning research, 3, Mar, 1157-1182, (2003) · Zbl 1102.68556
[17] Guyon, I.; Weston, J.; Barnhill, S.; Vapnik, V., Gene selection for cancer classification using support vector machines, Machine Learning, 46, 1-3, 389-422, (2002) · Zbl 0998.68111
[18] Huang, K.; Yang, H.; King, I.; Lyu, M.; Chan, L., The minimum error minimax probability machine, The Journal of Machine Learning Research, 5, 1253-1286, (2004) · Zbl 1222.62071
[19] Keerthi, S. S.; Chapelle, O.; DeCoste, D., Building support vector machines with reduced classifier complexity, The Journal of Machine Learning Research, 7, 1493-1515, (2006) · Zbl 1222.68230
[20] Lanckriet, G.; El-Ghaoui, L.; Bhattacharyya, C.; Jordan, M., A robust minimax approach to classification, The Journal of Machine Learning Research, 3, 555-582, (2003) · Zbl 1084.68657
[21] Lichman, M. (2013). UCI machine learning repository. http://archive.ics.uci.edu/ml. Accessed August 15, 2016.
[22] Maldonado, S.; Pérez, J.; Weber, R.; Labbé, M., Feature selection for support vector machines via mixed integer linear programming, Information Sciences, 279, 163-175, (2014) · Zbl 1354.68226
[23] Maldonado, S.; Weber, R., A wrapper method for feature selection using support vector machines, Information Sciences, 179, 13, 2208-2217, (2009)
[24] Maldonado, S.; Weber, R.; Basak, J., Simultaneous feature selection and classification using kernel-penalized support vector machines, Information Sciences, 181, 1, 115-128, (2011)
[25] Maldonado, S.; Weber, R.; Famili, F., Feature selection for high-dimensional class-imbalanced data sets using support vector machines, Information Sciences, 286, 228-246, (2014)
[26] Netzer, O.; Feldman, R.; Goldenberg, J.; Fresko, M., Mine your own business: market-structure surveillance through text mining, Marketing Science, 31, 3, 521-543, (2012)
[27] Neumann, J.; Schnörr, C.; Steidl, G., Combined SVM-based feature selection and classification, Machine Learning, 61, 1-3, 129-150, (2005) · Zbl 1137.90643
[28] Rinaldi, F.; Sciandrone, M., Feature selection combining linear support vector machines and concave optimization, Optimization Methods & Software, 25, 1, 117-128, (2010) · Zbl 1190.90145
[29] Steadman, I. (2013). IBM’s Watson is better at diagnosing cancer than human doctors. https://www.wired.co.uk/article/ibm-watson-medical-doctor. Accessed 15 August 2016.
[30] Suykens, J. A.; Vandewalle, J., Least squares support vector machine classifiers, Neural Processing Letters, 9, 3, 293-300, (1999)
[31] Ustun, B.; Rudin, C., Supersparse linear integer models for optimized medical scoring systems, Machine Learning, 102, 3, 349-391, (2016) · Zbl 1406.62144
[32] Vapnik, V., The nature of statistical learning theory, (1995), Springer-Verlag · Zbl 0833.62008
[33] Vapnik, V., Statistical learning theory, 1, (1998), Wiley · Zbl 0935.62007
[34] Wang, L.; Zhu, J.; Zou, H., The doubly regularized support vector machine, Statistica Sinica, 16, 2, 589-615, (2006) · Zbl 1126.68070
[35] Weston, J.; Elisseeff, A.; Schölkopf, B.; Tipping, M., Use of the zero norm with linear models and kernel methods, The Journal of Machine Learning Research, 3, 1439-1461, (2003) · Zbl 1102.68605
[36] Wu, M.; Schölkopf, B.; Bakır, G., A direct method for building sparse kernel learning algorithms, The Journal of Machine Learning Research, 7, 603-624, (2006) · Zbl 1222.68335
[37] Yegulalp, S. (2015). IBM’s Watson mines twitter for sentiments. http://www.infoworld.com/article/2897602/big-data/ibms-watson-now-mines-twitter-for-sentiments-good-bad-and-ugly.html. Accessed August 15, 2016.
[38] Zhu, J.; Rosset, S.; Hastie, T.; Tibshirani, R., 1-norm support vector machines, Advances in Neural Information Processing Systems, 16, 1, 49-56, (2004)
[39] Zou, H.; Hastie, T., Regularization and variable selection via the elastic net, Journal of the Royal Statistical Society: Series B (Statistical Methodology), 67, 2, 301-320, (2005) · Zbl 1069.62054
This reference list is based on information provided by the publisher or from digital mathematics libraries. Its items are heuristically matched to zbMATH identifiers and may contain data conversion errors. It attempts to reflect the references listed in the original paper as accurately as possible without claiming the completeness or perfect precision of the matching.