×

Ideal regularization for learning kernels from labels. (English) Zbl 1325.68194

Summary: In this paper, we propose a new form of regularization that is able to utilize the label information of a data set for learning kernels. The proposed regularization, referred to as ideal regularization, is a linear function of the kernel matrix to be learned. The ideal regularization allows us to develop efficient algorithms to exploit labels. Three applications of the ideal regularization are considered. Firstly, we use the ideal regularization to incorporate the labels into a standard kernel, making the resulting kernel more appropriate for learning tasks. Next, we employ the ideal regularization to learn a data-dependent kernel matrix from an initial kernel matrix (which contains prior similarity information, geometric structures, and labels of the data). Finally, we incorporate the ideal regularization to some state-of-the-art kernel learning problems. With this regularization, these learning problems can be formulated as simpler ones which permit more efficient solvers. Empirical results show that the ideal regularization exploits the labels effectively and efficiently.

MSC:

68T05 Learning and adaptive systems in artificial intelligence
PDFBibTeX XMLCite
Full Text: DOI

References:

[2] Belkin, M.; Niyogi, P.; Sindhwani, V., Manifold regularization: a geometric framework for learning from labeled and unlabeled examples, The Journal of Machine Learning Research, 7, 2399-2434 (2006) · Zbl 1222.68144
[3] Bertsekas, D., Nonlinear programming (1999), Athena Scientific: Athena Scientific Belmont, MA · Zbl 0935.90037
[4] Cristianini, N.; Shawe-Taylor, J.; Elisseeff, A.; Kandola, J., On kernel-target alignment, (Advances in neural information processing systems, Vol. 14 (2002)), 367-373
[6] Golub, G.; Van Loan, C., Matrix computations (1996), Johns Hopkins University Press · Zbl 0865.65009
[7] Gretton, A.; Bousquet, O.; Smola, A.; Schölkopf, B., Measuring statistical dependence with Hilbert-Schmidt norms, (Algorithmic learning theory (2005), Springer), 63-77 · Zbl 1168.62354
[9] Jain, P.; Kulis, B.; Davis, J. V.; Dhillon, I. S., Metric and kernel learning using a linear transformation, The Journal of Machine Learning Research, 13, 519-547 (2012) · Zbl 1283.68290
[10] Joachims, T., Making large-scale SVM learning practical, (Advances in kernel methods support vector learning (1999)), 169-184
[12] Lanckriet, G.; Cristianini, N.; Bartlett, P.; Ghaoui, L.; Jordan, M., Learning the kernel matrix with semidefinite programming, The Journal of Machine Learning Research, 5, 27-72 (2004) · Zbl 1222.68241
[13] Lanckriet, G.; De Bie, T.; Cristianini, N.; Jordan, M.; Noble, W., A statistical framework for genomic data fusion, Bioinformatics, 20, 2626-2635 (2004)
[14] Lodhi, H.; Saunders, C.; Shawe-Taylor, J.; Cristianini, N.; Watkins, C., Text classification using string kernels, The Journal of Machine Learning Research, 2, 419-444 (2002) · Zbl 1013.68176
[17] Pan, B.; Lai, J.; Chen, W., Nonlinear nonnegative matrix factorization based on mercer kernel construction, Pattern Recognition, 44, 2800-2810 (2011) · Zbl 1218.68150
[18] Pan, B.; Xia, J. J.; Yuan, P.; Gateno, J.; Ip, H. H.; He, Q., Incremental kernel ridge regression for the prediction of soft tissue deformations, (Medical image computing and computer-assisted intervention. Medical image computing and computer-assisted intervention, MICCAI 2012 (2012), Springer), 99-106
[19] Rakotomamonjy, A.; Bach, F.; Canu, S.; Grandvalet, Y., Simplemkl, The Journal of Machine Learning Research, 9, 2491-2521 (2008) · Zbl 1225.68208
[20] Raykar, V.; Yu, S.; Zhao, L.; Valadez, G.; Florin, C.; Bogoni, L., Learning from crowds, The Journal of Machine Learning Research, 11, 1297-1322 (2010)
[21] Smola, A.; Kondor, R., Kernels and regularization on graphs, (Learning theory and kernel machines: 16th annual conference on learning theory and 7th kernel workshop, COLT/kernel (2003), Springer), 144-158 · Zbl 1274.68351
[22] Song, L.; Smola, A.; Borgwardt, K.; Gretton, A., Colored maximum variance unfolding, (Advances in neural information processing systems, Vol. 20 (2008)), 1385-1392
[23] Sonnenburg, S.; Rätsch, G.; Schäfer, C.; Schölkopf, B., Large scale multiple kernel learning, The Journal of Machine Learning Research, 7, 1531-1565 (2006) · Zbl 1222.90072
[24] Xiong, H.; Swamy, M.; Ahmad, M., Optimizing the kernel in the empirical feature space, IEEE Transactions on Neural Networks, 16, 460-474 (2005)
[26] Yeung, D.; Chang, H.; Dai, G., Learning the kernel matrix by maximizing a KFD-based class separability criterion, Pattern Recognition, 40, 2021-2028 (2007) · Zbl 1111.68628
[27] Zhu, X.; Kandola, J.; Ghahramani, Z.; Lafferty, J., Nonparametric transforms of graph kernels for semi-supervised learning, (Advances in neural information processing systems, Vol. 17 (2005)), 1641-1648
[28] Zhuang, J.; Tsang, I.; Hoi, S., A family of simple non-parametric kernel learning algorithms, The Journal of Machine Learning Research, 12, 1313-1347 (2011) · Zbl 1280.68223
This reference list is based on information provided by the publisher or from digital mathematics libraries. Its items are heuristically matched to zbMATH identifiers and may contain data conversion errors. In some cases that data have been complemented/enhanced by data from zbMATH Open. This attempts to reflect the references listed in the original paper as accurately as possible without claiming completeness or a perfect matching.