×

Online minimum error entropy algorithm with unbounded sampling. (English) Zbl 1410.68327

Summary: Minimum error entropy (MEE) criterion is an important optimization method in information theoretic learning (ITL) and has been widely used and studied in various practical scenarios. In this paper, we shall introduce the online MEE algorithm for dealing with big datasets, associated with reproducing kernel Hilbert spaces (RKHS) and unbounded sampling processes. Explicit convergence rate will be given under the conditions of regularity of the regression function and polynomially decaying step sizes. Besides its low complexity, we will also show that the learning ability of online MEE is superior to the previous work in the literature. Our main techniques depend on integral operators on RKHS and probability inequalities for random variables with values in a Hilbert space.

MSC:

68T05 Learning and adaptive systems in artificial intelligence
41A46 Approximation by arbitrary nonlinear expressions; widths and entropy
46E22 Hilbert spaces with reproducing kernels (= (proper) functional Hilbert spaces, including de Branges-Rovnyak and other structured spaces)
62J02 General nonlinear regression
68W27 Online algorithms; streaming algorithms
PDFBibTeX XMLCite
Full Text: DOI

References:

[1] Aronszajn, N., Theory of reproducing kernels, Trans. Amer. Math. Soc.68(3) (1950) 337-404. · Zbl 0037.20701
[2] Bennett, G., Probability inequalities for the sum of independent random variables, J. Amer. Statist. Assoc.57(297) (1962) 33-45. · Zbl 0104.11905
[3] Caponnetto, A. and De Vito, E., Optimal learning rates for regularized least-squares algorithms, Found. Comput. Math.7(3) (2007) 331-368. · Zbl 1129.68058
[4] Cesa-Bianchi, N., Conconi, A. and Gentile, C., On the generalization ability of online learning algorithms, IEEE Trans. Inform. Theory50(9) (2004) 2050-2057. · Zbl 1295.68182
[5] Chen, X. M. and Lei, Y. W., Refined bounds for online pairwise learning algorithms, Neurocomput.275 (2017) 2656-2665.
[6] Chen, B. and Principe, J. C., Stochastic gradient algorithm under \((h, \phi )\)-entropy criterion, Circuits, Syst. Signal Process.26(6) (2007) 941-960. · Zbl 05320945
[7] Chen, B., Zhu, Y. and Hu, J., Mean-square convergence analysis of adaline training with minimum error entropy criterion, IEEE Trans. Neural Netw.21(7) (2010) 1168-1179.
[8] Erdogmus, D., Hild, K. II and Principe, J. C., Blind source separation using Rényi’s \(\alpha \)-marginal entropies, Neurocomput.49(1-4) (2002) 25-38. · Zbl 1047.68097
[9] Erdogmus, D. and Principe, J. C., Comparison of entropy and mean square error criteria in adaptive system training using higher order statistics, in Proc. 2nd Int. Workshop on Independent Component Analysis & Blind Signal (Springer-Verlag, Berlin, 2000), pp. 75-90.
[10] Erdogmus, D. and Principe, J. C., Convergence properties and data efficiency of the minimum error entropy criterion in adaline training, IEEE Trans. Signal Process.51 (2003) 1966-1978.
[11] Fan, J., Hu, T., Wu, Q. and Zhou, D. X., Consistency analysis of an empirical minimum error entropy algorithm, Appl. Comput. Harmon. Anal.41(1) (2016) 164-189. · Zbl 1382.94034
[12] Y. Feng, J. Fan and J. A. Suykens, A statistical learning approach to modal regression, to appear in J. Mach. Learn. Res.; https://arxiv.org/abs/1702.05960. · Zbl 1497.68418
[13] Fine, S. and Scheinberg, K., Efficient svm training using low-rank kernel representations, J. Mach. Learn. Res.2(2) (2002) 243-264. · Zbl 1037.68112
[14] Gokcay, E. and Principe, J. C., Information theoretic clustering, IEEE Trans. Pattern Anal. Mach. Learn.24(2) (2002) 158-171.
[15] Greengard, L. and Strain, J., The fast Gauss transform, SIAM J. Sci. Statist. Comput.12(1) (1991) 79-94. · Zbl 0721.65089
[16] Hu, T., Fan, J., Wu, Q. and Zhou, D. X., Learning theory approach to a minimum error entropy criterion, J. Mach. Learn. Res.14(1) (2013) 377-397. · Zbl 1320.62096
[17] Hu, T., Fan, J., Wu, Q. and Zhou, D. X., Regularization schemes for minimum error entropy principle, Anal. Appl.13(4) (2015) 437-455. · Zbl 1329.68216
[18] T. Hu, Q. Wu and D. X. Zhou, Kernel gradient descent algorithm for information theoretic learning, submitted to IEEE Trans. Inform. Theory.
[19] Hu, T., Wu, Q. and Zhou, D. X., Convergence of gradient descent method for minimum error entropy principle in linear regression, IEEE Trans. Signal Process.64(24) (2016) 6571-6579. · Zbl 1414.94263
[20] Kar, P., Sriperumbudur, B. K., Jain, P. and Karnick, H. C., On the generalization ability of online learning algorithms for pairwise loss functions, in Proc. 30th Int. Conf. Mach. Learn. (ICML), Vol. 28, Atlanta, GA, USA, (2013), pp. 441-449.
[21] Lin, J. H. and Zhou, D. X., Online learning algorithms can converge comparably fast as batch learning, IEEE Trans. Neural Netw. Learn. Syst. (2017) 1-12; https://doi.org/10.1109/TNNLS.2017.2677970
[22] Liu, W., Park, I. and Principe, J., An information theoretic approach of designing sparse kernel adaptive filters, IEEE Trans. Neural Netw.20(12) (2009) 1950-1961.
[23] Lv, S. G. and Feng, Y. L., Integral operator approach to learning theory with unbounded sampling, Compl. Anal. Operator Theory6(3) (2012) 533-548. · Zbl 1285.68143
[24] Parzen, E., On the estimation of a probability density function and the mode, Ann. Math. Statist.33 (1962) 1065-1076. · Zbl 0116.11302
[25] Pinelis, I., Optimum bounds for the distributions of martingales in banach spaces, Ann. Probab.22(4) (1994) 1679-1706. · Zbl 0836.60015
[26] Principe, J. C., Information Theoretic Learning: Renyi’s Entropy and Kernel Perspectives (Springer-Verlag, New York, 2010). · Zbl 1206.94003
[27] Shen, P. and Li, C., Minimum total error entropy method for parameter estimation, IEEE Trans. Signal Process.63(15) (2015) 4079-4090. · Zbl 1394.94812
[28] Silva, L. M., Marques de Sá, J. and Alexandre, L. A., Neural network classification using Shannon’s entropy, in Proc. European Symp. on Artificial Neural Networks, Bruges, Belgium (2005), pp. 217-222.
[29] Silva, L. M., Marques de Sá, J. and Alexandre, L. A., The MEE principle in data classification: A perceptron-based analysis, Neural Comput.22(10) (2010) 2698-2728. · Zbl 1208.68182
[30] Smale, S. and Zhou, D. X., Learning theory estimates via integral operators and their approximations, Construct. Approx.26 (2007) 153-172. · Zbl 1127.68088
[31] Wang, C. and Zhou, D. X., Optimal learning rates for least-squares regularized regression with unbounded sampling, J. Complex.27 (2011) 55-67. · Zbl 1217.65024
[32] Williams, C. and Seeger, M., Using the nyström method to speed up kernel machines, Adv. Neural Inform. Process. Syst.13 (2001) 682-688.
[33] Ying, Y. and Pontil, M., Online gradient descent learning algorithms, Found. Comput. Math.8(5) (2008) 561-596. · Zbl 1175.68211
[34] Ying, Y. and Zhou, D. X., Unregularized online learning algorithms with general loss functions, Appl. Comput. Harmon. Anal.42(2) (2015) 224-244. · Zbl 1382.68204
[35] Ying, Y. and Zhou, D. X., Online pairwise learning algorithms, Neural Comput.28(4) (2016) 743-777. · Zbl 1472.68221
This reference list is based on information provided by the publisher or from digital mathematics libraries. Its items are heuristically matched to zbMATH identifiers and may contain data conversion errors. In some cases that data have been complemented/enhanced by data from zbMATH Open. This attempts to reflect the references listed in the original paper as accurately as possible without claiming completeness or a perfect matching.