zbMATH — the first resource for mathematics

Information theoretic learning with adaptive kernels. (English) Zbl 1203.94001
Summary: This paper presents an online algorithm for adapting the kernel width that is a free parameter in information theoretic cost functions using Renyi’s entropy. This kernel computes the interactions between the error samples and essentially controls the nature of the performance surface over which the parameters of the system adapt. Since the error in an adaptive system is non-stationary during training, a fixed value of the kernel width may affect the adaptation dynamics and even compromise the location of the global optimum in parameter space. The proposed online algorithm for adapting the kernel width is derived from first principles and minimizes the Kullback-Leibler divergence between the estimated error density and the true density. We characterize the performance of this novel approach with simulations of linear and nonlinear systems training, using the minimum error entropy criterion with the proposed adaptive kernel algorithm. We conclude that adapting the kernel width improves the rate of convergence of the parameters, and decouples the convergence rate and misadjustment of the filter weights.

94-04 Software, source code, etc. for problems pertaining to information and communication theory
94A15 Information theory (general)
68T05 Learning and adaptive systems in artificial intelligence
94A17 Measures of information, entropy
Full Text: DOI
[1] Walach, E.; Widrow, B.: Least mean fourth (LMF) adaptive algorithm and its family, IEEE transactions on information theory 30, No. 2, 275-282 (1984)
[2] Barros, A.; Principe, J.; Takeuchi, Y.; Ohnishi, N.: Using non-linear even functions for error minimization in adaptive filters, Neurocomputing 70, 9-13 (2006)
[3] Hyvarinen, A.; Karhunen, J.; Oja, E.: Independent component analysis, (2001)
[4] Bell, A.; Sejnowski, T.: An information-maximization approach to blind source separation and blind deconvolution, Neural computation 7, 1129-1159 (1995)
[5] J. Principe, J. Fisher, D. Xu, Information theoretic learning, in: Unsupervised Adaptive Filtering, Wiley, New York, 2000, pp. 275–282.
[6] Erdogmus, D.; Principe, J.: Generalized information potential criterion for adaptive system training, IEEE transactions on neural networks 13, No. 5, 1035-1044 (2002)
[7] Erdogmus, D.; Principe, J.: An error-entropy minimization algorithm for supervised training of nonlinear adaptive systems, IEEE transactions on signal processing 50, No. 7, 1780-1786 (2002)
[8] Erdogmus, D.; Principe, J.: From linear adaptive filtering to nonlinear information processing, Signal processing magazine 23, No. 6, 14-23 (2006)
[9] Silverman, B. W.: Density estimation for statistics and data analysis, (1986) · Zbl 0617.62042
[10] Renyi, A.: Some fundamental questions of information theory, Selected papers of Alfred Rényi 2 (1976)
[11] Parzen, E.: On the estimation of a probability density function and mode, Annals of mathematical statistics 33, 1065-1076 (1962) · Zbl 0116.11302 · doi:10.1214/aoms/1177704472
[12] Duin, R.: On the choice of smoothing parameters for parzen estimators of probability density functions, IEEE transactions on computers 25, No. 7, 1175-1179 (1976) · Zbl 0359.93035 · doi:10.1109/TC.1976.1674577
[13] Hall, P.: On Kullback–Leibler loss and density estimation, Annals of statistics 15, No. 4, 1491-1519 (1987) · Zbl 0678.62045 · doi:10.1214/aos/1176350606
[14] Rudemo, M.: Empirical choice of histograms and kernel density estimators, Scandinavian journal of statistics 9, No. 9, 65-78 (1982) · Zbl 0501.62028
[15] Shimazaki, H.; Shinomoto, S.: A method for selecting the bin size of a time histogram, Neural computation 19, No. 6, 1503-1527 (2007) · Zbl 1115.92014 · doi:10.1162/neco.2007.19.6.1503
[16] Koyama, S.; Shinomoto, S.: Histogram bin width selection for time-dependent Poisson processes, Journal of physics A: mathematical and general 37, 7255-7265 (2004) · Zbl 1056.62093 · doi:10.1088/0305-4470/37/29/006
[17] Bowman, A.: An alternative method of cross-validation for the smoothing of density estimates, Biometrika 71, No. 4, 353-360 (1984)
[18] Lorenz, E.: Deterministic non-periodic flow, Journal of the atmospheric sciences 20, 130-141 (1963) · Zbl 1417.37129
[19] Morejon, R.; Principe, J.: Advanced search algorithms for information-theoretic learning with kernel-based estimators, IEEE transactions on neural networks 15, No. 4, 874-884 (2004)
This reference list is based on information provided by the publisher or from digital mathematics libraries. Its items are heuristically matched to zbMATH identifiers and may contain data conversion errors. It attempts to reflect the references listed in the original paper as accurately as possible without claiming the completeness or perfect precision of the matching.