×

Learning from partially supervised data using mixture models and belief functions. (English) Zbl 1181.68231

Summary: This paper addresses classification problems in which the class membership of training data are only partially known. Each learning sample is assumed to consist of a feature vector \(\mathbf x_i \in \mathcal X\) and an imprecise and/or uncertain “soft” label \(m_i\) defined as a Dempster-Shafer basic belief assignment over the set of classes. This framework thus generalizes many kinds of learning problems including supervised, unsupervised and semi-supervised learning. Here, it is assumed that the feature vectors are generated from a mixture model. Using the generalized Bayesian theorem, an extension of Bayes’ theorem in the belief function framework, we derive a criterion generalizing the likelihood function. A variant of the expectation maximization algorithm, dedicated to the optimization of this criterion is proposed, allowing us to compute estimates of model parameters. Experimental results demonstrate the ability of this approach to exploit partial information about class labels.

MSC:

68T10 Pattern recognition, speech recognition

Software:

UCI-ml
PDFBibTeX XMLCite
Full Text: DOI HAL

References:

[1] Hosmer, D. W., A comparison of iterative maximum likelihood estimates of the parameters of a mixture of two normal distributions under three different types of sample, Biometrics, 29, 761-770 (1973)
[2] McLachlan, G. J., Estimating the linear discriminant function from initial samples containing a small number of unclassified observations, J. Am. Stat. Assoc., 72, 358, 403-406 (1977) · Zbl 0369.62060
[3] (Chapelle, O.; Schölkopf, B.; Zien, A., Semi-Supervised Learning (2006), MIT Press: MIT Press Cambridge, MA)
[4] Bengio, Y.; Grandvalet, Y., Semi-supervised learning by entropy minimization, (Advances in Neural Information Processing Systems, vol. 17 (2005), MIT Press: MIT Press Cambridge, MA), 529-536
[5] Corduneanu, A.; Jaakkola, T., On information regularization, (Uncertainty in Artificial Intelligence (2003)), 151-158
[6] Joachims, T., Transductive inference for text classification using support vector machine, (16th International Conference on Machine Learning (1999), Morgan Kaufmann: Morgan Kaufmann Los Altos, CA), 202-209
[7] T.D. Bie, Semi-supervised learning based on kernel methods and graph cut algorithms, Ph.D. Thesis, K.U. Leuven (Leuven, Belgium), Faculty of Engineering, May 2005.; T.D. Bie, Semi-supervised learning based on kernel methods and graph cut algorithms, Ph.D. Thesis, K.U. Leuven (Leuven, Belgium), Faculty of Engineering, May 2005.
[8] Zhou, D.; Bousquet, O.; Lal, T.; Schölkopf, B., Learning with local and global consistency, (Advances in Neural Information Processing Systems, vol. 16 (2004)), 321-328
[9] X. Zhu, Z. Ghahramani, Learning from labelled and unlabelled data with label propagation, Technical Report, Carnegie Mellon University, 2002.; X. Zhu, Z. Ghahramani, Learning from labelled and unlabelled data with label propagation, Technical Report, Carnegie Mellon University, 2002.
[10] Denœux, T., A \(k\)-nearest neighbor classification rule based on Dempster-Shafer theory, IEEE Trans. Syst. Man Cybernet., 25, 5, 804-813 (1995)
[11] Ambroise, C.; Govaert, G., EM algorithm for partially known labels, (Proceedings of the 7th Conference of the International Federation of Classification Societies (IFCS-2000) (2000), Springer: Springer Namur, Belgium), 161-166 · Zbl 1029.62056
[12] Ambroise, C.; Denœux, T.; Govaert, G.; Smets, P., Learning from an imprecise teacher: probabilistic and evidential approaches, (Proceedings of ASMDA ’01 (2001), Compiègne: Compiègne France), 100-105
[13] Grandvallet, Y., Logistic regression for partial labels, (9th International Conference on Information Processing and Management of Uncertainty in Knowledge-based Systems (IPMU ’02), vol. III (2002)), 1935-1941
[14] Hüllermeier, E.; Beringer, J., Learning from ambiguously labeled examples, (Proceedings of the 6th International Symposium on Intelligent Data Analysis (IDA-05) (2005), Madrid: Madrid Spain), 168-179 · Zbl 1141.68567
[15] Lawrence, N. D.; Schölkopf, B., Estimating a kernel fisher discriminant in the presence of label noise, (Proceedings of the 18th International Conference on Machine Learning (2001), Morgan Kaufmann: Morgan Kaufmann San Francisco, CA), 306-313
[16] Amini, R.; Gallinari, P., Semi-supervised learning with an imperfect supervisor, Knowl. Inf. Syst., 8, 4, 385-413 (2005)
[17] Karmaker, A.; Kwek, S., A boosting approach to remove class label noise, (Proceedings of the 5th International Conference on Hybrid Intelligent Systems (2005)), 6-9 · Zbl 1116.68074
[18] Li, Y.; Lodeswyk, W.; De Ridder, D.; Reinders, M., Classification in the presence of class noise using a probabilistic kernel fisher method, Pattern Recognition, 40, 3349-3357 (2007) · Zbl 1123.68363
[19] Shafer, G., A Mathematical Theory of Evidence (1976), Princeton University Press: Princeton University Press Princeton, NJ · Zbl 0359.62002
[20] Smets, P.; Kennes, R., The transferable belief model, Artif. Intell., 66, 191-243 (1994) · Zbl 0807.68087
[21] Denœux, T.; Zouhal, L. M., Handling possibilistic labels in pattern classification using evidential reasoning, Fuzzy Sets Syst., 122, 3, 47-62 (2001) · Zbl 1063.68635
[22] Elouedi, Z.; Mellouli, K.; Smets, P., Belief decision trees: theoretical foundations, Int. J. Approximate Reasoning, 28, 91-124 (2001) · Zbl 0991.68088
[23] Trabelsi, S.; Elouedi, Z.; Mellouli, K., Pruning belief decision tree methods in averaging and conjunctive approaches, Int. J. Approximate Reasoning, 46, 3, 568-595 (2007) · Zbl 1185.68718
[24] Jenhani, I.; Ben Amor, N.; Elouedi, Z., Decision trees as possibilistic classifiers, Int. J. Approximate Reasoning, 48, 3, 784-807 (2008)
[25] Ben Yaghlane, A.; Denœux, T.; Mellouli, K., Elicitation of expert opinions for constructing belief functions, (Proceedings of the 11th International Conference on Information Processing and Management of Uncertainty in Knowledge-Based Systems (IPMU ’06), vol. I (2006)), 403-411 · Zbl 1159.68576
[26] Vannoorenberghe, P.; Smets, P., Partially supervised learning by a credal EM approach, (Godo, L., Proceedings of the 8th European Conference on Symbolic and Quantitative Approaches to Reasoning with Uncertainty (ECSQARU ’05) (2005), Springer: Springer Barcelona, Spain), 956-967 · Zbl 1122.68508
[27] Vannoorenberghe, P., Estimation de modèles de mélanges finis par un algorithm EM crédibiliste, Trait. du Signal, 24, 2, 103-113 (2007)
[28] McLachlan, G. J.; Peel, D., Finite Mixture Models (2000), Wiley: Wiley New York · Zbl 0963.62061
[29] Nigam, K.; McCallum, A.; Thrun, S.; Mitchell, T., Text classification from labelled and unlabelled documents using EM, Mach. Learn., 39, 2-3, 103-134 (2000) · Zbl 0949.68162
[30] Jraidi, I.; Elouedi, Z., Belief classification approach based on generalized credal EM, (Mellouli, K., 9th European Conference on Symbolic and Quantitative Approaches to Reasoning with Uncertainty (ECSQARU ’07) (2007), Springer: Springer Hammamet, Tunisia), 524-535 · Zbl 1148.68436
[31] Dempster, A. P., Upper and lower probabilities induced by a multivalued mapping, Ann. Math. Stat., 38, 325-339 (1967) · Zbl 0168.17501
[32] Dubois, D.; Prade, H., On the unicity of Dempster’s rule of combination, Int. J. Intell. Syst., 1, 133-142 (1986) · Zbl 0641.68159
[33] Smets, P., Belief functions: the disjunctive rule of combination and the generalized Bayesian theorem, Int. J. Approximate Reasoning, 9, 1-35 (1993) · Zbl 0796.68177
[34] Ben Yaghlane, B.; Smets, P.; Mellouli, K., Belief function independence: I. The marginal case, Int. J. Approximate Reasoning, 29, 47-70 (2002) · Zbl 1015.68207
[35] Ben Yaghlane, B.; Smets, P.; Mellouli, K., Belief function independence: II. The conditional case, Int. J. Approximate Reasoning, 31, 31-75 (2002) · Zbl 1052.68126
[36] Smets, P., Belief functions on real numbers, Int. J. Approximate Reasoning, 40, 3, 181-223 (2005) · Zbl 1110.68149
[37] Caron, F.; Ristic, B.; Duflos, E.; Vanheeghe, P., Least committed basic belief density induced by a multivariate Gaussian: formulation with applications, Int. J. Approximate Reasoning, 48, 2, 419-436 (2008) · Zbl 1185.68703
[38] A. Aregui, T. Denœux, Novelty detection in the belief functions framework, in: Proceedings of IPMU ’06, vol. 1, Paris, 2006, pp. 412-419.; A. Aregui, T. Denœux, Novelty detection in the belief functions framework, in: Proceedings of IPMU ’06, vol. 1, Paris, 2006, pp. 412-419.
[39] P. Smets, Un modèle mathématico-statistique simulant le processus du diagnostic médical, Ph.D. Thesis, Université Libre de Bruxelles, Brussels, Belgium, 1978 (in French).; P. Smets, Un modèle mathématico-statistique simulant le processus du diagnostic médical, Ph.D. Thesis, Université Libre de Bruxelles, Brussels, Belgium, 1978 (in French).
[40] Delmotte, F.; Smets, P., Target identification based on the transferable belief model interpretation of Dempster-Shafer model, IEEE Trans. Syst. Man Cybernet. A, 34, 4, 457-471 (2004)
[41] Denœux, T.; Smets, P., Classification using belief functions: the relationship between the case-based and model-based approaches, IEEE Trans. Syst. Man Cybernet. B, 36, 6, 1395-1406 (2006)
[42] Dempster, A. P.; Laird, N. M.; Rubin, D. B., Maximum likelihood from incomplete data via the EM algorithm, J. R. Stat. Soc. B, 39, 1-38 (1977) · Zbl 0364.62022
[43] McLachlan, G. J.; Krishnan, T., The EM Algorithm and Extensions (1997), Wiley: Wiley New York · Zbl 0882.62012
[44] Smets, P., Possibilistic inference from statistical data, (2nd World Conference on Mathematics at the Service of Man (1982), Universidad Politecnica de Las Palmas), 611-613
[45] Walley, P.; Moral, S., Upper probabilities based on the likelihood function, J. R. Stat. Soc. B, 161, 831-847 (1999) · Zbl 0940.62004
[46] Shenoy, P. P.; Giang, P. H., Decision making on the sole basis of statistical likelihood, Artif. Intell., 165, 2, 137-163 (2005) · Zbl 1132.91371
[47] Smets, P., Numerical representation of uncertainty, (Gabbay, D. M.; Smets, P., Handbook of Defeasible Reasoning and Uncertainty Management Systems, vol. 3 (1998), Kluwer Academic Publishers: Kluwer Academic Publishers Dordrecht), 265-309 · Zbl 0944.91014
[48] Monney, P.-A., A Mathematical Theory of Arguments for Statistical Evidence, Contributions to Statistics (2003), Physica-Verlag: Physica-Verlag Heidelberg
[49] Cobb, B. R.; Shenoy, P. P., On the plausibility transformation method for translating belief function models to probability models, Int. J. Approximate Reasoning, 41, 3, 314-330 (2006) · Zbl 1093.68112
[50] Banfield, J.; Raftery, A., Model-based Gaussian and non-Gaussian clustering, Biometrics, 49, 803-821 (1993) · Zbl 0794.62034
[51] Celeux, G.; Govaert, G., Gaussian parsimonious clustering models, Pattern Recognition, 28, 5, 781-793 (1995)
[52] Bouveyron, C.; Girard, S.; Schmid, C., High-dimensional data clustering, Comput. Stat. Data Anal., 52, 502-519 (2007) · Zbl 1452.62433
[53] Bouveyron, C.; Girard, S.; Schmid, C., High dimensional discriminant analysis, Commun. Stat. Theory Meth., 36, 2596-2607 (2007) · Zbl 1128.62072
[54] Fraley, C.; Raftery, A., Bayesian regularization for normal mixture estimation and model-based clustering, J. Classification, 24, 2, 155-181 (2007), http://dx.doi.org/10.1007/s00357-007-0004-5 · Zbl 1159.62302
[55] Ueda, N.; Nakano, R., Deterministic annealing variant of the EM algorithm, (Advances in Neural Information Processing Systems, vol. 7 (1995)), 545-552
[56] Klir, G. J.; Wierman, M. J., Uncertainty-Based Information. Elements of Generalized Information Theory (1998), Springer: Springer New-York · Zbl 0902.68061
[57] Ramer, A., Uniqueness of information measure in the theory of evidence, Fuzzy Sets Syst., 24, 183-196 (1987) · Zbl 0638.94027
[58] O’Neill, T., Normal discrimination with unclassified observations, J. Am. Stat. Assoc., 73, 364, 821-826 (1978) · Zbl 0409.62047
[59] P.M. Murphy, D.W. Aha, UCI repository of machine learning databases (machine-readable data repository), University of California, Department of Information and Computer Science, Irvine, CA, 1994.; P.M. Murphy, D.W. Aha, UCI repository of machine learning databases (machine-readable data repository), University of California, Department of Information and Computer Science, Irvine, CA, 1994.
[60] Côme, E.; Oukhellou, L.; Aknin, P., Diagnosis of complex system by combined use of RKCCA and graphical model, (CCKM’06 Workshop on Current Challenge in Kernel Methods. CCKM’06 Workshop on Current Challenge in Kernel Methods, Brussels (2006)) · Zbl 1181.68231
This reference list is based on information provided by the publisher or from digital mathematics libraries. Its items are heuristically matched to zbMATH identifiers and may contain data conversion errors. In some cases that data have been complemented/enhanced by data from zbMATH Open. This attempts to reflect the references listed in the original paper as accurately as possible without claiming completeness or a perfect matching.