×

Extreme entropy machines: robust information theoretic classification. (English) Zbl 1425.68343

Summary: Most existing classification methods are aimed at minimization of empirical risk (through some simple point-based error measured with loss function) with added regularization. We propose to approach the classification problem by applying entropy measures as a model objective function. We focus on quadratic Renyi’s entropy and connected Cauchy-Schwarz Divergence which leads to the construction of extreme entropy machines (EEM). The main contribution of this paper is proposing a model based on the information theoretic concepts which on the one hand shows new, entropic perspective on known linear classifiers and on the other leads to a construction of very robust method competitive with the state of the art non-information theoretic ones (including Support Vector Machines and Extreme Learning Machines). Evaluation on numerous problems spanning from small, simple ones from UCI repository to the large (hundreds of thousands of samples) extremely unbalanced (up to \(100:1\) classes’ ratios) datasets shows wide applicability of the EEM in real-life problems. Furthermore, it scales better than all considered competitive methods.

MSC:

68T05 Learning and adaptive systems in artificial intelligence
94A17 Measures of information, entropy
PDFBibTeX XMLCite
Full Text: DOI arXiv

References:

[1] Anthony M (2003) Learning multivalued multithreshold functions. CDMA Research Report No. LSE-CDMA-2003-03, London School of Economics
[2] Bache K, Lichman M (2013) UCI machine learning repository. http://archive.ics.uci.edu/ml. Accessed 30 June 2015
[3] Chang CC, Lin CJ (2011) Libsvm: a library for support vector machines. ACM Trans Intell Syst Technol 2(3):27 · doi:10.1145/1961189.1961199
[4] Cortes C, Vapnik V (1995) Support-vector networks. Mach Learn 20(3):273-297 · Zbl 0831.68098
[5] Cover TM, Thomas JA (2012) Elements of information theory. Wiley, New York · Zbl 0762.94001
[6] Czarnecki WM, Tabor J (2014) Cluster based RBF kernel for support vector machines. ArXiv e-prints. http://arxiv.org/abs/1408.2869. Accessed 30 June 2015
[7] Czarnecki WM, Tabor J (2014) Multithreshold Entropy Linear Classifier: Theory and applications. Expert Syst Appl 42(13):5591-5606 · doi:10.1016/j.eswa.2015.03.007
[8] Dempster AP, Laird NM, Rubin DB Maximum likelihood from incomplete data via the em algorithm. In: Journal of the Royal Statistical Society. Series B (Methodological), JSTOR, pp 1-38 (1977) · Zbl 0364.62022
[9] Drineas P, Mahoney MW (2005) On the Nyström method for approximating a Gram matrix for improved kernel-based learning. J Mach Learn Res 6:2153-2175 · Zbl 1222.68186
[10] Durrant RJ, Kaban A (2013) Sharp generalization error bounds for randomly-projected classifiers. Proceedings of International Conference on Machine Learning (ICML), pp 693-701
[11] Huang GB, Zhu QY, Siew CK: Extreme learning machine: a new learning scheme of feedforward neural networks. In: Proceedings of the 2004 IEEE international joint conference on neural networks, 2004, vol 2. IEEE, pp 985-990 (2004)
[12] Huang GB, Zhu QY, Siew CK (2006) Extreme learning machine: theory and applications. Neurocomputing 70(1):489-501 · doi:10.1016/j.neucom.2005.12.126
[13] Jenssen R, Principe JC, Erdogmus D, Eltoft T (2006) The Cauchy-Schwarz divergence and parzen windowing: connections to graph theory and mercer kernels. J Frankl Inst 343(6):614-629 · Zbl 1105.93001 · doi:10.1016/j.jfranklin.2006.03.018
[14] Jones E, Oliphant T, Peterson P (2001) Scipy: open source scientific tools for python. http://www.scipy.org/. Accessed 30 June 2015
[15] Kulkarni SR, Lugosi G, Venkatesh SS (1998) Learning pattern classification-a survey. IEEE Trans Inf Theory 44(6):2178-2206 · Zbl 0935.68093 · doi:10.1109/18.720536
[16] Ledoit O, Wolf M (2004) A well-conditioned estimator for large-dimensional covariance matrices. J Multivar Anal 88(2):365-411 · Zbl 1032.62050 · doi:10.1016/S0047-259X(03)00096-4
[17] Mahalanobis PC (1936) On the generalized distance in statistics. Proc Natl Inst Sci (Calcutta) 2:49-55 · Zbl 0015.03302
[18] Parzen E (1962) On estimation of a probability density function and mode. Ann Math Stat 33:1065-1076 · Zbl 0116.11302 · doi:10.1214/aoms/1177704472
[19] Pedregosa F, Varoquaux G, Gramfort A, Michel V, Thirion B, Grisel O, Blondel M, Prettenhofer P, Weiss R, Dubourg V et al (2011) Scikit-learn: machine learning in python. J Mach Learn Res 12:2825-2830 · Zbl 1280.68189
[20] Poggio T, Girosi F (1989) A theory of networks for approximation and learning. In: Tech. rep, DTIC document · Zbl 1226.92005
[21] Principe JC (2000) Information theoretic learning. Springer, Berlin · Zbl 0965.68135
[22] Silverman BW (1986) Density estimation for statistics and data analysis, vol 26. CRC Press, Boca Raton · Zbl 0617.62042 · doi:10.1007/978-1-4899-3324-9
[23] Suykens JA, Vandewalle J (1999) Least squares support vector machine classifiers. Neural Process Lett 9(3):293-300 · Zbl 0958.93042 · doi:10.1023/A:1018628609742
[24] Suykens JA, De Brabanter J, Lukas L, Vandewalle J (2002) Weighted least squares support vector machines: robustness and sparse approximation. Neurocomputing 48(1):85-105 · Zbl 1006.68799 · doi:10.1016/S0925-2312(01)00644-0
[25] Tabor J, Spurek P (2014) Cross-entropy clustering. Pattern Recogn 47(9):3046-3059 · Zbl 1342.68279 · doi:10.1016/j.patcog.2014.03.006
[26] Titterington DM, Smith AF, Makov UE et al (1985) Statistical analysis of finite mixture distributions, vol 7. Wiley, New York · Zbl 0646.62013
[27] Van Der Walt S, Colbert SC, Varoquaux G (2011) The numpy array: a structure for efficient numerical computation. Comput Sci Eng 13(2):22-30 · doi:10.1109/MCSE.2011.37
[28] Van Gestel T, Suykens JA, Baesens B, Viaene S, Vanthienen J, Dedene G, De Moor B, Vandewalle J (2004) Benchmarking least squares support vector machine classifiers. Mach Learn 54(1):5-32 · Zbl 1078.68737 · doi:10.1023/B:MACH.0000008082.80494.e0
[29] Zhang, T., Zhou, Z.H.: Large margin distribution machine. In: Proceedings of the 20th ACM SIGKDD international conference on Knowledge discovery and data mining. ACM, pp 313-322 (2014)
[30] Zong W, Huang GB, Chen Y (2013) Weighted extreme learning machine for imbalance learning. Neurocomputing 101:229-242 · doi:10.1016/j.neucom.2012.08.010
This reference list is based on information provided by the publisher or from digital mathematics libraries. Its items are heuristically matched to zbMATH identifiers and may contain data conversion errors. In some cases that data have been complemented/enhanced by data from zbMATH Open. This attempts to reflect the references listed in the original paper as accurately as possible without claiming completeness or a perfect matching.