×

Statistical behavior and consistency of classification methods based on convex risk minimization. (English) Zbl 1105.62323

Summary: We study how closely the optimal Bayes error rate can be approximately reached using a classification algorithm that computes a classifier by minimizing a convex upper bound of the classification error function. The measurement of closeness is characterized by the loss function used in the estimation. We show that such a classification scheme can be generally regarded as a (nonmaximum-likelihood) conditional in-class probability estimate, and we use this analysis to compare various convex loss functions that have appeared in the literature. Furthermore, the theoretical insight allows us to design good loss functions with desirable properties. Another aspect of our analysis is to demonstrate the consistency of certain classification methods using convex risk minimization. This study sheds light on the good performance of some recently proposed linear classification methods including boosting and support vector machines. It also shows their limitations and suggests possible improvements.

MSC:

62F15 Bayesian inference
62H30 Classification and discrimination; cluster analysis (statistical aspects)

Software:

AdaBoost.MH
PDFBibTeX XMLCite
Full Text: DOI Euclid

References:

[1] Bregman, L. M. (1967). The relaxation method of finding a common point of convex sets and its application to the solution of problems in convex programming. U.S.S.R. Computational Mathematics and Mathematical Physics 7 200–217. · Zbl 0186.23807
[2] Breiman, L. (1998). Arcing classifiers (with discussion). Ann. Statist. 26 801–849. · Zbl 0934.62064 · doi:10.1214/aos/1024691079
[3] Breiman, L. (1999). Prediction games and arcing algorithms. Neural Computation 11 1493–1517.
[4] Breiman, L. (2000). Some infinity theory for predictor ensembles. Technical Report 577, Dept. Statistics, Univ. California, Berkeley.
[5] Bühlmann, P. and Yu, B. (2003). Boosting with \(L_2\)-loss: Regression and classification. J. Amer. Statist. Assoc. 98 324–339. · Zbl 1041.62029 · doi:10.1198/016214503000125
[6] Freund, Y. and Schapire, R. E. (1997). A decision-theoretic generalization of on-line learning and an application to boosting. J. Comput. System Sci. 55 119–139. · Zbl 0880.68103 · doi:10.1006/jcss.1997.1504
[7] Friedman, J., Hastie, T. and Tibshirani, R. (2000). Additive logistic regression: A statistical view of boosting (with discussion). Ann. Statist. 28 337–407. · Zbl 1106.62323 · doi:10.1214/aos/1016218223
[8] Leshno, M., Lin, Ya. V., Pinkus, A. and Schocken, S. (1993). Multilayer feedforward networks with a non-polynomial activation function can approximate any function. Neural Networks 6 861–867.
[9] Lugosi, G. and Vayatis, N. (2004). On the Bayes-risk consistency of regularized boosting methods. Ann. Statist. 32 30–55. · Zbl 1105.62319 · doi:10.1214/aos/1079120129
[10] Mannor, S., Meir, R. and Zhang, T. (2002). The consistency of greedy algorithms for classification. In Proc. 15th Annual Conference on Computational Learning Theory. Lecture Notes in Comput. Sci. 2375 319–333. Springer, New York. · Zbl 1050.68581
[11] Rockafellar, R. T. (1970). Convex Analysis . Princeton Univ. Press. · Zbl 0193.18401
[12] Rudin, W. (1987). Real and Complex Analysis , 3rd ed. McGraw-Hill, New York. · Zbl 0925.00005
[13] Schapire, R. E., Freund, Y., Bartlett, P. and Lee, W. S. (1998). Boosting the margin: A new explanation for the effectiveness of voting methods. Ann. Statist. 26 1651–1686. · Zbl 0929.62069 · doi:10.1214/aos/1024691352
[14] Schapire, R. E. and Singer, Y. (1999). Improved boosting algorithms using confidence-rated predictions. Machine Learning 37 297–336. · Zbl 0945.68194 · doi:10.1023/A:1007614523901
[15] Steinwart, I. (2002). Support vector machines are universally consistent. J. Complexity 18 768–791. · Zbl 1030.68074 · doi:10.1006/jcom.2002.0642
[16] Vapnik, V. N. (1998). Statistical Learning Theory . Wiley, New York. · Zbl 0935.62007
[17] Wahba, G. (1990). Spline Models for Observational Data . SIAM, Philadelpfia. · Zbl 0813.62001
[18] Zhang, T. (2001). A leave-one-out cross validation bound for kernel methods with applications in learning. In Proc. 14th Annual Conference on Computational Learning Theory 427–443. Springer, New York. · Zbl 0992.68113
This reference list is based on information provided by the publisher or from digital mathematics libraries. Its items are heuristically matched to zbMATH identifiers and may contain data conversion errors. In some cases that data have been complemented/enhanced by data from zbMATH Open. This attempts to reflect the references listed in the original paper as accurately as possible without claiming completeness or a perfect matching.