×

Multikernel regression with sparsity constraint. (English) Zbl 1479.46026

Summary: In this paper, we provide a Banach-space formulation of supervised learning with generalized total-variation (gTV) regularization. We identify the class of kernel functions that are admissible in this framework. Then, we propose a variation of supervised learning in a continuous-domain hybrid search space with gTV regularization. We show that the solution admits a multikernel expansion with adaptive positions. In this representation, the number of active kernels is upper-bounded by the number of data points while the gTV regularization imposes an \(\ell_1\) penalty on the kernel coefficients. Finally, we illustrate numerically the outcome of our theory.

MSC:

46E22 Hilbert spaces with reproducing kernels (= (proper) functional Hilbert spaces, including de Branges-Rovnyak and other structured spaces)
46E27 Spaces of measures
47A52 Linear operators and ill-posed problems, regularization
62G08 Nonparametric regression and quantile regression
68T05 Learning and adaptive systems in artificial intelligence
PDFBibTeX XMLCite
Full Text: DOI arXiv

References:

[1] N. Aronszajn, Theory of reproducing kernels, Trans. Amer. Math. Soc., 68 (1950), pp. 337-404. · Zbl 0037.20701
[2] N. Aronszajn and K. T. Smith, Theory of Bessel potentials I, Ann. Inst. Fourier, 11 (1961), pp. 385-475. · Zbl 0102.32401
[3] F. Bach, Breaking the curse of dimensionality with convex neural networks, J. Mach. Learn. Res., 18 (2017), pp. 629-681. · Zbl 1433.68390
[4] F. R. Bach, Consistency of the group LASSO and multiple kernel learning, J. Mach. Learn. Res., 9 (2008), pp. 1179-1225. · Zbl 1225.68147
[5] F. R. Bach, G. Lanckriet, and M. Jordan, Multiple kernel learning, conic duality, and the SMO algorithm, in Proceedings of the Twenty-First International Conference on Machine Learning, 2004, pp. 41-48.
[6] J. Bazerque and G. Giannakis, Nonparametric basis pursuit via sparse kernel-based learning, IEEE Signal Process. Mag., 30 (2013), pp. 112-125.
[7] A. Beck and M. Teboulle, A fast iterative shrinkage-thresholding algorithm for linear inverse problems, SIAM J. Imaging Sci., 2 (2009), pp. 183-202. · Zbl 1175.94009
[8] A. Berlinet and C. Thomas-Agnan, Reproducing Kernel Hilbert Spaces in Probability and Statistics, Springer Science & Business Media, New York, 2011. · Zbl 1145.62002
[9] C. M. Bishop, Pattern Recognition and Machine Learning, Springer, New York, 2006. · Zbl 1107.68072
[10] K. Bredies and H. Pikkarainen, Inverse problems in spaces of measures, ESAIM Control Optim. Calc. Var., 19 (2013), pp. 190-218. · Zbl 1266.65083
[11] E. Candès and C. Fernandez-Granda, Super-resolution from noisy data, J. Fourier Anal. Appl., 19 (2013), pp. 1229-1254. · Zbl 1312.94015
[12] A. Caponnetto and E. De Vito, Optimal rates for the regularized least-squares algorithm, Found. Comput. Math., 7 (2007), pp. 331-368. · Zbl 1129.68058
[13] F. Cucker and D. X. Zhou, Learning Theory: An Approximation Theory Viewpoint, Cambridge University Press, Cambridge, 2007. · Zbl 1274.41001
[14] I. Daubechies, R. DeVore, M. Fornasier, and S. Güntürk, Iteratively reweighted least squares minimization for sparse recovery, Comm. Pure Appl. Math., 63 (2010), pp. 1-38. · Zbl 1202.65046
[15] C. De Boor and R. E. Lynch, On splines and their minimum properties, J. Math. Mech., 15 (1966), pp. 953-969. · Zbl 0185.20501
[16] Y. De Castro and F. Gamboa, Exact reconstruction using Beurling minimal extrapolation, J. Math. Anal. Appl., 395 (2012), pp. 336-354. · Zbl 1302.94019
[17] T. Debarre, J. Fageot, H. Gupta, and M. Unser, B-spline-based exact discretization of continuous-domain inverse problems with generalized TV regularization, IEEE Trans. Inform. Theory, 65 (2019), pp. 4457-4470. · Zbl 1432.94022
[18] Q. Denoyelle, V. Duval, and G. Peyré, Support recovery for sparse super-resolution of positive measures, J. Fourier Anal. Appl., 23 (2017), pp. 1153-1194. · Zbl 1417.65223
[19] Q. Denoyelle, V. Duval, G. Peyré, and E. Soubies, The sliding Frank-Wolfe algorithm and its application to super-resolution microscopy, Inverse Problems, 36 (2020), 014001. · Zbl 1434.65082
[20] L. Devroye, L. Györfi, and G. Lugosi, A Probabilistic Theory of Pattern Recognition, Springer Science & Business Media, New York, 2013. · Zbl 0853.68150
[21] J. Duchon, Splines minimizing rotation-invariant semi-norms in Sobolev spaces, in Constructive Theory of Functions of Several Variables, Springer, New York, 1977, pp. 85-100. · Zbl 0342.41012
[22] V. Duval and G. Peyré, Exact support recovery for sparse spikes deconvolution, Found. Comput. Math., 15 (2015), pp. 1315-1355. · Zbl 1327.65104
[23] M. Eberts and I. Steinwart, Optimal regression rates for SVMs using Gaussian kernels, Electron. J. Stat., 7 (2013), pp. 1-42. · Zbl 1337.62073
[24] T. Evgeniou, M. Pontil, and T. Poggio, Regularization networks and support vector machines, Adv. Comput. Math., 13 (2000), 1. · Zbl 0939.68098
[25] G. E. Fasshauer, F. J. Hickernell, and Q. Ye, Solving support vector machines in reproducing kernel Banach spaces with positive definite functions, Appl. Comput. Harmon. Anal., 38 (2015), pp. 115-139. · Zbl 1366.68236
[26] C. Fernandez-Granda, Super-resolution of point sources via convex programming, Inf. Inference, 5 (2016), pp. 251-303. · Zbl 1386.94027
[27] M. A. Figueiredo, R. D. Nowak, and S. Wright, Gradient projection for sparse reconstruction: Application to compressed sensing and other inverse problems, IEEE J. Sel. Topics Signal Process., 1 (2007), pp. 586-597.
[28] S. D. Fisher and J. W. Jerome, Spline solutions to L1 extremal problems in one and several variables, J. Approx. Theory, 13 (1975), pp. 73-83. · Zbl 0295.49002
[29] J. Gao, P. Kwan, and D. Shi, Sparse kernel learning with LASSO and Bayesian inference algorithm, Neural Netw., 23 (2010), pp. 257-264. · Zbl 1396.68090
[30] I. M. Gelfand and G. E. Shilov, Generalized Functions. Vol. 1, Properties and Operations, Academic Press, New York, 1969.
[31] F. Girosi, M. Jones, and T. Poggio, Priors, Stabilizers and Basis Functions: From Regularization to Radial, Tensor and Additive Splines, Technical report, Massachusetts Institute of Technology, 1993.
[32] M. Gönen and E. Alpayd\in, Multiple kernel learning algorithms, J. Mach. Learn. Res., 12 (2011), pp. 2211-2268. · Zbl 1280.68167
[33] H. Gupta, J. Fageot, and M. Unser, Continuous-domain solutions of linear inverse problems with Tikhonov versus generalized TV regularization, IEEE Trans. Signal Process., 66 (2018), pp. 4670-4684. · Zbl 1415.94119
[34] L. Györfi, M. Kohler, A. Krzyzak, and H. Walk, A Distribution-Free Theory of Nonparametric Regression, Springer Science & Business Media, New York, 2006. · Zbl 1021.62024
[35] T. Hastie, R. Tibshirani, and J. Friedman, Overview of supervised learning, in The Elements of Statistical Learning, Springer, New York, 2009, pp. 9-41. · Zbl 1273.62005
[36] G. Kimeldorf and G. Wahba, Some results on Tchebycheffian spline functions, J. Math. Anal. Appl., 33 (1971), pp. 82-95. · Zbl 0201.39702
[37] M. Kloft, U. Brefeld, P. Laskov, K. Müller, A. Zien, and S. Sonnenburg, Efficient and accurate \(\ell_p\)-norm multiple kernel learning, in Advances in Neural Information Processing Systems, 2009, pp. 997-1005. · Zbl 1280.68173
[38] M. Kloft, U. Rückert, and P. L. Bartlett, A unifying view of multiple kernel learning, in Proceedings of the Joint European Conference on Machine Learning and Knowledge Discovery in Databases, Springer, New York, 2010, pp. 66-81.
[39] A. Kurdila and M. Zabarankin, Convex Functional Analysis, Springer Science & Business Media, New York, 2006. · Zbl 1077.46002
[40] G. Lanckriet, N. Cristianini, P. Bartlett, L. Ghaoui, and M. Jordan, Learning the kernel matrix with semidefinite programming, J. Mach. Learn. Res., 5 (2004), pp. 27-72. · Zbl 1222.68241
[41] E. Mammen and S. van de Geer, Locally adaptive regression splines, Ann. Statist., 25 (1997), pp. 387-413. · Zbl 0871.62040
[42] S. Mendelson and J. Neeman, Regularization in kernel learning, Ann. Statist., 38 (2010), pp. 526-565. · Zbl 1191.68356
[43] C. A. Micchelli and M. Pontil, Learning the kernel function via regularization, J. Mach. Learn. Res., 6 (2005), pp. 1099-1125. · Zbl 1222.68265
[44] A. Rakotomamonjy, F. R. Bach, S. Canu, and Y. Grandvalet, SimpleMKL, J. Mach. Learn. Res., 9 (2008), pp. 2491-2521. · Zbl 1225.68208
[45] V. Roth, The generalized LASSO, IEEE Trans. Neural Netw., 15 (2004), pp. 16-28.
[46] W. Rudin, Functional Analysis. International Series in Pure and Applied Mathematics, McGraw-Hill, New York, 1991. · Zbl 0867.46001
[47] W. Rudin, Real and Complex Analysis, McGraw-Hill, New York, 2006. · Zbl 0925.00005
[48] K. Sato, Lévy Processes and Infinitely Divisible Distributions, Cambridge University Press, Cambridge, 1999. · Zbl 0973.60001
[49] B. Schölkopf, R. Herbrich, and A. Smola, A generalized representer theorem, in Computational Learning Theory, Springer, New York, 2001, pp. 416-426. · Zbl 0992.68088
[50] B. Schölkopf and A. Smola, Learning with Kernels: Support Vector Machines, Regularization, Optimization, and Beyond, MIT Press, Cambridge, MA, 2001.
[51] L. Schwartz, Théorie des distributions, vol. 2, Hermann, Paris, 1957. · Zbl 0089.09601
[52] J. Shawe-Taylor and N. Cristianini, Kernel Methods for Pattern Analysis, Cambridge University Press, Cambridge, 2004. · Zbl 0994.68074
[53] L. Shi, Y.-L. Feng, and D.-X. Zhou, Concentration estimates for learning with \(\ell_1\)-regularizer and data dependent hypothesis spaces, Appl. Comput. Harmon. Anal., 31 (2011), pp. 286-302. · Zbl 1221.68201
[54] B. Simon, Distributions and their Hermite expansions, J. Math. Phys., 12 (1971), pp. 140-148. · Zbl 0205.12901
[55] A. Smola, B. Schölkopf, and K. Müller, The connection between regularization operators and support vector kernels, Neural Netw., 11 (1998), pp. 637-649.
[56] E. Soubies, F. Soulez, M. McCann, T.-a. Pham, L. Donati, T. Debarre, D. Sage, and M. Unser, Pocket guide to solve inverse problems with GlobalBioIm, Inverse Problems, 35 (2019), pp. 1-20. · Zbl 1480.65384
[57] I. Steinwart, Sparseness of support vector machines, J. Mach. Learn. Res., 4 (2003), pp. 1071-1105. · Zbl 1094.68082
[58] I. Steinwart, Sparseness of support vector machines-Some asymptotically sharp bounds, in Advances in Neural Information Processing Systems, 2004, pp. 1069-1076.
[59] I. Steinwart and A. Christmann, Support Vector Machines, Springer Science & Business Media, New York, 2008. · Zbl 1203.68171
[60] I. Steinwart and A. Christmann, Sparsity of SVMs that use the epsilon-insensitive loss, in Advances in Neural Information Processing Systems, 2009, pp. 1569-1576.
[61] I. Steinwart, D. Hush, and C. Scovel, An explicit description of the reproducing kernel Hilbert spaces of Gaussian RBF kernels, IEEE Trans. Inform. Theory, 52 (2006), pp. 4635-4643. · Zbl 1320.68148
[62] I. Steinwart, D. R. Hush, and C. Scovel, Optimal rates for regularized least squares regression, in Proceedings of the Conference on Learning Theory, 2009, pp. 79-93.
[63] A. Tikhonov, Solution of incorrectly formulated problems and the regularization method, Soviet Math. Dokl., 4 (1963), pp. 1035-1038. · Zbl 0141.11001
[64] M. Unser, J. Fageot, and J. Ward, Splines are universal solutions of linear inverse problems with generalized TV regularization, SIAM Rev., 59 (2017), pp. 769-793. · Zbl 1382.41011
[65] M. Unser and P. D. Tafti, An Introduction to Sparse Stochastic Processes, Cambridge University Press, Cambridge, 2014. · Zbl 1329.60002
[66] T. Valkonen, First-Order Primal-Dual Methods for Nonsmooth Non-Convex Optimisation, preprint, arXiv:1910.00115 [math. OC], 2019.
[67] V. Vapnik, Statistical Learning Theory, Wiley, New York, 1998. · Zbl 0935.62007
[68] G. Wahba, Spline Models for Observational Data, SIAM, Philadelphia, PA, 1990. · Zbl 0813.62001
[69] H.-Y. Wang, Q.-W. Xiao, and D.-X. Zhou, An approximation theory approach to learning with \(\ell_1\) regularization, J. Approx. Theory, 167 (2013), pp. 240-258. · Zbl 1283.68308
[70] A. Yuille, The motion coherence theory, in Proceedings of the International Conference on Computer Vision, 1998, pp. 344-354.
[71] H. Zhang, Y. Xu, and J. Zhang, Reproducing kernel Banach spaces for machine learning, J. Mach. Learn. Res., 10 (2009), pp. 2741-2775. · Zbl 1235.68217
[72] H. Zhang and J. Zhang, Regularized learning in Banach spaces as an optimization problem: Representer theorems, J. Global Optim., 54 (2012), pp. 235-250. · Zbl 1281.90088
This reference list is based on information provided by the publisher or from digital mathematics libraries. Its items are heuristically matched to zbMATH identifiers and may contain data conversion errors. In some cases that data have been complemented/enhanced by data from zbMATH Open. This attempts to reflect the references listed in the original paper as accurately as possible without claiming completeness or a perfect matching.