About the non-convex optimization problem induced by non-positive semidefinite kernel learning. (English) Zbl 1284.90058

Summary: During the last years, kernel based methods proved to be very successful for many real-world learning problems. One of the main reasons for this success is the efficiency on large data sets which is a result of the fact that kernel methods like support vector machines (SVM) are based on a convex optimization problem. Solving a new learning problem can now often be reduced to the choice of an appropriate kernel function and kernel parameters. However, it can be shown that even the most powerful kernel methods can still fail on quite simple data sets in cases where the inherent feature space induced by the used kernel function is not sufficient. In these cases, an explicit feature space transformation or detection of latent variables proved to be more successful. Since such an explicit feature construction is often not feasible for large data sets, the ultimate goal for efficient kernel learning would be the adaptive creation of new and appropriate kernel functions. It can, however, not be guaranteed that such a kernel function still leads to a convex optimization problem for Support Vector Machines. Therefore, we have to enhance the optimization core of the learning method itself before we can use it with arbitrary, i.e., non-positive semidefinite, kernel functions. This article motivates the usage of appropriate feature spaces and discusses the possible consequences leading to non-convex optimization problems. We will show that these new non-convex optimization SVM are at least as accurate as their quadratic programming counterparts on eight real-world benchmark data sets in terms of the generalization performance. They always outperform traditional approaches in terms of the original optimization problem. Additionally, the proposed algorithm is more generic than existing traditional solutions since it will also work for non-positive semidefinite or indefinite kernel functions.


90C26 Nonconvex programming, global optimization
68T05 Learning and adaptive systems in artificial intelligence
74P05 Compliance or weight optimization in solid mechanics
Full Text: DOI


[1] Alpay D (2001) The schur algorithm, reproducing kernel spaces and system theory. In: SMF/AMS texts and monographs, vol 5. SMF, France · Zbl 1059.93001
[2] Beyer H-G, Schwefel H-P (2002) Evolution strategies: a comprehensive introduction. J Natural Comput 1(1): 2–52 · Zbl 1014.68134
[3] Burges C (1998) A tutorial on support vector machines for pattern recognition. Data Mining Knowledge Discovery 2(2): 121–167 · Zbl 05470543
[4] Camps-Valls G, Martin-Guerrero J, Rojo-Alvarez J, Soria-Olivas E (2004) Fuzzy sigmoid kernel for support vector classifiers. Neurocomputing 62: 501–506 · Zbl 02223913
[5] Chang C-C, Lin C-J (2001) LIBSVM: a library for support vector machines
[6] Frphlich H, Chapelle O, Schölkopf B (2004) Feature selection for support vector machines using genetic algorithms. Int J Artif Intell Tools 13(4): 791–800 · Zbl 05421286
[7] Haasdonk B (2005) Feature space interpretation of svms with indefinite kernels. IEEE Trans Pattern Anal Mach Intell 27(4): 482–492 · Zbl 05111097
[8] Hastie T, Tibshirani R, Friedman J (2001) The elements of statistical learning: data mining, inference, and prediction Springer Series in Statistics. Springer, Heidelberg · Zbl 0973.62007
[9] Joachims T (1999) Making large-scale SVM learning practical. In: Schölkopf B, Burges C, Smola A (eds). Advances in Kernel Methods–support vector learning, chapter 11. MIT Press, Cambridge
[10] Kimeldorf GS, Wahba G (1971) Some results on Tchebycheffian spline functions. J Math Anal Appl 33: 82–95 · Zbl 0201.39702
[11] Lin H-T, Lin C-J (2003a) A study on sigmoid kernels for SVM and the training of non-PSD kernels by SMO-type methods
[12] Lin H-T, Lin C-J (2003b) A study on sigmoid kernels for svm and the training of non-PSD kernels by SMO-type methods
[13] Mary X (2003) Hilbertian subspaces, subdualities and applications. PhD thesis, Institut National des Sciences Appliquees Rouen
[14] Mercer J (1909) Functions of positive and negative type and their connection with the theory of integral equations. Philos Trans Roy Soc A 209: 415–446 · JFM 40.0408.02
[15] Mierswa I (2006) Evolutionary learning with kernels: a generic solution for large margin problems. In: Proceedings of the genetic and evolutionary computation conference (GECCO 2006)
[16] Mierswa I, Wurst M, Klinkenberg R, Scholz M, Euler T (2006) YALE: rapid prototyping for complex data mining tasks. In: Proceedings of the ACM SIGKDD international conference on knowledge discovery and data mining (KDD 2006)
[17] Newman D, Hettich S, Blake C, Merz C (1998) UCI repository of machine learning databases. http://www.ics.uci.edu/\(\sim\)mlearn/MLRepository.html
[18] Ong C, Mary X, Canu S, Smola AJ (2004a) Learning with non-positive kernels. In: Proceedings of the 21st international conference on machine learning (ICML), pp 639–646
[19] Ong C, Mary X, Canu S, Smola AJ (2004b) Learning with non-positive kernels. In: Proceedings of the 21st international conference on machine learning (ICML 2004), pp 639–646
[20] Platt J (1999) Advances in large margin classifiers. Probabilistic outputs for support vector machines and comparisons to regularized likelihood methods. MIT Press, Cambridge
[21] Rasmussen CE, Quinonero-Candela J (2005) Healing the relevance vector machine through augmentation. In: Proceedings of the 22nd international conference on machine learning (ICML 2005). ACM Press, New York, pp 689–696
[22] Rechenberg I (1973) Evolutionsstrategie: optimierung technischer Systeme nach Prinzipien der biologischen Evolution. Frommann-Holzboog, Stuttgart
[23] Ritthoff O, Klinkenberg R, Fischer S, Mierswa I (2002) A hybrid approach to feature selection and generation using an evolutionary algorithm. In: Proceedings of the 2002 U.K. workshop on computational intelligence (UKCI-02). University of Birmingham, pp 147–154
[24] Rüping S (2000) mySVM Manual. Universität Dortmund, Lehrstuhl Informatik VIII. http://www-ai.cs.uni-dortmund.de/SOFTWARE/MYSVM
[25] Schölkopf B, Smola AJ (2002) Learning with kernels–support vector machines, regularization, optimization, and beyond. MIT Press, Cambridge
[26] Smola AJ, Ovari ZL, Williamson RC (2000) Regularization with dot-product kernels. In: Proceedings of the neural information processing systems (NIPS), pp 308–314
[27] StatLib (2002) Statlib–datasets archive. http://lib.stat.cmu.edu/datasets
[28] Storch T (2005) On the impact of objective function transformations on evolutionary and black-box algorithms. In: Proceedings of the genetic and evolutionary computation conference (GECCO), pp 833–840
[29] Tipping ME (2001) Sparse bayesian learning and the relevance vector machine. J Mach Learn Res 1: 211–244 · Zbl 0997.68109
[30] Vapnik V (1998) Statistical learning theory. Wiley, New York · Zbl 0935.62007
This reference list is based on information provided by the publisher or from digital mathematics libraries. Its items are heuristically matched to zbMATH identifiers and may contain data conversion errors. It attempts to reflect the references listed in the original paper as accurately as possible without claiming the completeness or perfect precision of the matching.