×

Least ambiguous set-valued classifiers with bounded error levels. (English) Zbl 1478.62175

Summary: In most classification tasks, there are observations that are ambiguous and therefore difficult to correctly label. Set-valued classifiers output sets of plausible labels rather than a single label, thereby giving a more appropriate and informative treatment to the labeling of ambiguous instances. We introduce a framework for multiclass set-valued classification, where the classifiers guarantee user-defined levels of coverage or confidence (the probability that the true label is contained in the set) while minimizing the ambiguity (the expected size of the output). We first derive oracle classifiers assuming the true distribution to be known. We show that the oracle classifiers are obtained from level sets of the functions that define the conditional probability of each class. Then we develop estimators with good asymptotic and finite sample properties. The proposed estimators build on existing single-label classifiers. The optimal classifier can sometimes output the empty set, but we provide two solutions to fix this issue that are suitable for various practical needs.

MSC:

62H30 Classification and discrimination; cluster analysis (statistical aspects)

Software:

UCI-ml; ML-KNN
PDFBibTeX XMLCite
Full Text: DOI arXiv

References:

[1] Audibert, J.-Y.; Tsybakov, A. B., Fast Learning Rates for Plug-in Classifiers, The Annals of Statistics, 35, 608-633 (2007) · Zbl 1118.62041
[2] Bartlett, P. L.; Wegkamp, M. H., Classification with a Reject Option Using a Hinge Loss, The Journal of Machine Learning Research, 9, 1823-1840 (2008) · Zbl 1225.62080
[3] Chow, C. K., On Optimum Recognition Error and Reject Tradeoff, IEEE Transactions on Information Theory, 16, 41-46 (1970) · Zbl 0185.47804
[4] Del Coz, J. J.; Diez, J.; Bahamonde, A., Learning Nondeterministic Classifiers, The Journal of Machine Learning Research, 10, 2273-2293 (2009) · Zbl 1235.68144
[5] Denis, C.; Hebiri, M., Consistency of Plug-in Confidence Sets for Classification in Semi-supervised Learning, arXiv:1507.07235 (2015)
[6] Devroye, L., The Uniform Convergence of Nearest Neighbor Regression Function Estimators and Their Application in Optimization, IEEE Transactions on Information Theory, 24, 142-151 (1978) · Zbl 0375.62083
[7] Grycko, E., Classification With Set-Valued Decision Functions, Information and Classification, Studies in Classification, Data Analysis and Knowledge Organization, 218-224 (1993), Heidelberg: Springer, Heidelberg
[8] Herbei, R.; Wegkamp, M. H., Classification With Reject Option, Canadian Journal of Statistics, 34, 709-721 (2006) · Zbl 1151.62302
[9] Le Cun, Y.; Boser, B.; Denker, J. S.; Henderson, D.; Howard, R. E.; Hubbard, W.; Jackel, L. D., Handwritten Digit Recognition with a Back-pRopagation Network, Advances in Neural Information Processing Systems 2. Proceedings of the 1989 Conference, 396-404 (1990)
[10] Lei, J., Classification With Confidence, Biometrika, 101, 755-769 (2014) · Zbl 1306.62143
[11] Lei, J.; Rinaldo, A.; Wasserman, L., A Conformal Prediction Approach to Explore Functional Data, Annals of Mathematics and Artificial Intelligence, 74, 29-43 (2014) · Zbl 1317.62039
[12] Lei, J.; Robins, J.; Wasserman, L., Distribution Free Prediction Set, Journal of the American Statistical Association, 108, 278-287 (2013) · Zbl 06158342
[13] Lei, J.; Wasserman, L., Distribution Free Prediction Bands for Nonparametric Regression, Journal of the Royal Statistical Society, 76, 71-96 (2014) · Zbl 1411.62103
[14] Lichman, M., UCI Machine Learning Repository (2013)
[15] Papadopoulos, H.; Proedrou, K.; Vovk, V.; Gammerman, A., Inductive Confidence Machines for Regression, Machine Learning: ECML 2002, 345-356 (2002), Springer · Zbl 1014.68514
[16] Ramaswamy, H. G.; Tewari, A.; Agarwal, S., Consistent Algorithms for Multiclass Classification with a Reject Option, Electronic Journal of Statistics, 12, 530-554 (2018) · Zbl 1473.62229
[17] Shafer, G.; Vovk, V., A Tutorial on Conformal Prediction, Journal of Machine Learning Research, 9, 371-421 (2008) · Zbl 1225.68215
[18] Silverman, B. W., Density Estimation for Statistics and Data Analysis, 26 (1986), Boca Raton, FL: CRC press, Boca Raton, FL · Zbl 0617.62042
[19] Stone, C., Optimal Global Rates of Convergence for Nonparametric Regression, The Annals of Statistics, 10, 1040-1053 (1982) · Zbl 0511.62048
[20] Tsoumakas, G.; Katakis, I., Multi-label Classification: An Overview, International Journal of Data Warehousing & Mining, 3, 1-13 (2007)
[21] Tsybakov, A. B., Introduction to Nonparametric Estimation (2009), New York: Springer, New York · Zbl 1176.62032
[22] Van De Geer, S. A., High-Dimensional Generalized Linear Models and the Lasso, The Annals of Statistics, 36, 614-645 (2008) · Zbl 1138.62323
[23] Vovk, V., Conditional Validity of Inductive Conformal Predictors, Machine Learning, 92, 349-376 (2013) · Zbl 1273.68307
[24] Vovk, V.; Fedorova, V.; Nouretdinov, I.; Gammerman, A., Criteria of Efficiency for Conformal Prediction, Proceedings of COPA 2016 (Fifth Symposium on Conformal and Probabilistic Prediction with Applications), 23-39 (2016)
[25] Vovk, V.; Gammerman, A.; Shafer, G., Algorithmic Learning in a Random World (2005), New York: Springer, New York · Zbl 1105.68052
[26] Vovk, V.; Petej, I.; Fedorova, V., From Conformal to Probabilistic Prediction, COPA 2014 Proceedings, Artificial Intelligence Applications and Innovations, 221-230 (2014)
[27] Yuan, M.; Wegkamp, M., Classification Methods with Reject Option Based on Convex Risk Minimization, Journal of Machine Learning Research, 11, 111-130 (2010) · Zbl 1242.62066
[28] Zhang, M. L.; Zhou, Z. H., Ml-knn: A Lazy Learning Approach to Multi-label Learning, Pattern Recognition, 40, 2038-2048 (2007) · Zbl 1111.68629
This reference list is based on information provided by the publisher or from digital mathematics libraries. Its items are heuristically matched to zbMATH identifiers and may contain data conversion errors. In some cases that data have been complemented/enhanced by data from zbMATH Open. This attempts to reflect the references listed in the original paper as accurately as possible without claiming completeness or a perfect matching.