×

Decontamination of mutual contamination models. (English) Zbl 1484.62077

Summary: Many machine learning problems can be characterized by mutual contamination models. In these problems, one observes several random samples from different convex combinations of a set of unknown base distributions and the goal is to infer these base distributions. This paper considers the general setting where the base distributions are defined on arbitrary probability spaces. We examine three popular machine learning problems that arise in this general setting: multiclass classification with label noise, demixing of mixed membership models, and classification with partial labels. In each case, we give sufficient conditions for identifiability and present algorithms for the infinite and finite sample settings, with associated performance guarantees.

MSC:

62H30 Classification and discrimination; cluster analysis (statistical aspects)
62G05 Nonparametric estimation

Software:

UCI-ml
PDFBibTeX XMLCite
Full Text: arXiv Link

References:

[1] S. Arora, R. Ge, and A. Moitra. Learning topic models-going beyond svd.Foundations of Computer Science, 2012.
[2] S. Arora, R. Ge, Y. Halpern, D. Mimno, A. Moitra, D. Sontag, Y. Wu, and M. Zhu. A practical algorithm for topic modeling with provable guarantees.International Conference on Machine Learning, 2013.
[3] S. Axler.Linear Algebra Done Right. Springer, 3rd edition, 2015. · Zbl 1304.15001
[4] L. Berkman, B. H. Singer, and K. Manton. Black/white differences in health status and mortality among the Elderly.Demography, 26:661-678, 1989.
[5] G. Blanchard and C. Scott. Decontamination of mutually contaminated models.International Conference of Artificial Intelligence and Statistics, 2014.
[6] G. Blanchard, G. Lee, and C. Scott. Semi-supervised novelty detection.Journal of Machine Learning Research, 11:2973-3009, 2010. · Zbl 1242.68205
[7] G. Blanchard, M. Flaska, G. Handy, S. Pozzi, and C. Scott. Classification with asymmetric label noise: Consistency and maximal denoising.Electronic Journal of Statistics, 10: 2780-2824, 2016. · Zbl 1347.62106
[8] D. Blei, A. Ng, and M. Jordan. Latent dirichlet allocation.Journal of Machine Learning research, 3:993-1022, 2003. · Zbl 1112.68379
[9] J. Cid-Sueiro. Proper losses for learning from partial labels.Advances in Neural Information Processing Systems, 2012.
[10] T. Cour, B. Sapp, and B. Taskar. Learning from partial labels.Journal of Machine Learning Research, 12:1501-1536, 2011. · Zbl 1280.68162
[11] R. Das, M. Zaheer, and C. Dyer. Gaussian lda for topic models with word embeddings. 2015.
[12] E. De Vito, L. Rosasco, and A. Toigo. Spectral regularization for support estimation. Advances in neural information processing systems, 2010.
[13] L. Devroye, L. Gy¨orfi, and G. Lugosi.A Probabilistic Theory of Pattern Recognition. Springer, 1996. · Zbl 0853.68150
[14] D. Dheeru and E. K. Taniskidou. UCI machine learning repository, 2017. URLhttp: //archive.ics.uci.edu/ml.
[15] W. Ding, M. Rohban, P. Ishwar, and V. Saligrama. Topic discovery through data dependent and random projections.International Conference on Machine Learning, 2013.
[16] W. Ding, M. Rohban, P. Ishwar, and V. Saligrama. Efficient distributed topic modeling with provable guarantees.International Conference on Artificial Intelligence and Statistics, 2014.
[17] D. Donoho and V. Stodden. When does non-negative matrix factorization give a correct decomposition into parts?Advances in neural information processing systems, 2003.
[18] R.A. Fisher. The use of multiple measurements in taxonomic problems.Annual Eugenics, pages 179-188, 1936.
[19] A. Ghosh, H. Kumar, and P.S. Sastry. Robust loss functions under label noise for deep neural networks.AAAI, 2017.
[20] K. Huang, X. Fu, and N. D. Sidiropoulos. Anchor-free correlated topic modeling.Advances in Neural Information Processing Systems, 2016.
[21] S. Jain, M. White, M. W. Trosset, and P. Radivojac.Nonparametric semi-supervised learning of class proportions.arXiv preprint arXiv:1601.01944, 2016.
[22] R. Jin and Z. Ghahramani. Learning with multiple labels.Advances in Neural Information Processing Systems, 2002.
[23] Y. LeCun, L. Bottou, Y. Bengio, and P. Haffner. Gradient-based learning applied to document recognition.IEEE, pages 2278-2324, 1998.
[24] C. Li, H. Wang, Z. Zhang, A. Sun, and Z. Ma. Topic modeling for short texts with auxiliary word embeddings.ACM SIGIR conference on Research and Development in Information Retrieval, 2016a.
[25] F. Li and P. Perona. A bayesian hierarchical model for learning natural scene categories. Computer Vision and Pattern Recognition, 2005.
[26] S. Li, T. Chua, and C. Miao. Generative topic embedding: a continuous representation of documents.ACL, 2016b.
[27] L.-P. Liu and T. G. Dietterich. A conditional multinomial mixture model for superset label learning.Advances in Neural Information Processing Systems, 2012.
[28] L.-P. Liu and T. G. Dietterich. Learnability of the superset label learning problem.International Conference on Machine Learning, 32, 2014.
[29] M. Luong, R. Socher, and C. Manning. Better word representations with recursive neural networks for morphology.Conference on Computational Natural Language Learning, 2013.
[30] A. Menon, B. van Rooyen, C. Ong, and B. Williamson. Learning from corrupted binary labels via class-probability estimation.International Conference on Machine Learning, 2015a.
[31] A. Menon, B. van Rooyen, C. S. Ong, and R. Williamson. Learning from corrupted binary labels via class-probability estimation. 2015b.
[32] A. Menon, B. van Rooyen, and N. Natarajan. Learning from binary labels with instancedependent corruption.arXiv preprint, 2016.
[33] E. Metodiev and J. Thaler. On the topic of jets.arXiv preprint arXiv:1802.00008, 2018a.
[34] E. Metodiev and J. Thaler. Jet topics: Disentangling quarks and gluons at colliders.Physical Review Letters, 120(24):241602, 2018b.
[35] N. Natarajan, I. S. Dhillon, P. Ravikumar, and A. Tewari. Learning with noisy labels. Advances in Neural Information Processing Systems, 2013. · Zbl 1467.68151
[36] N. Nyugen and R. Caruana. Classification with partial labels.International Conference on Knowledge Discovery and Data Mining, 2008.
[37] G. Patrini, A. Rozza, A. Menon, R. Nock, and L. Qu. Making deep neural networks robust to label noise: a loss correction approach.CVPR, 2017.
[38] J. K. Pritchard, M. Stephens, N. A. Rosenberg, and P. Donnelly. Association mapping in structured populations.American Journal of Human Genetics, 67:170-181, 2000.
[39] H. Ramaswamy, C Scott, and A Tewari. Mixture proportion estimation via kernel embeddings of distributions.International Conference on Machine Learning, 2016.
[40] B. Recht, C. Re, J. Tropp, and V. Bittorf. Factoring non-negative matrices with linear programs.Advances in Neural Information Processing Systems, 2012.
[41] A. Rudi, F. Odone, and E. De Vito. Geometrical and computational aspects of spectral support estimation for novelty detection.Pattern Recognition Letters, 2014.
[42] T. Sanderson and C. Scott. Class proportion estimation with application to multiclass anomaly rejection.International Conference on Artificial Intelligence and Statistics, 2014.
[43] C. Scott. A rate of convergence for mixture proportion estimation, with application to learning from noisy labels.International Conference on Artificial Intelligence and Statistics, 2015.
[44] C. Scott, G. Blanchard, and G. Handy. Classification with asymmetric label noise: Consistency and maximal denoising.Conference on Learning Theory, 2013. · Zbl 1347.62106
[45] B. van Rooyen and R. Williamson. Learning in the presence of corruption.arXiv preprint, 2015.
[46] B. van Rooyen, A. Menon, and R. Williamson. Learning with symmetric label noise: The importance of being unhinged.Advances in Neural Information Processing Systems, 2015.
[47] G. Xun, Y. Li, W. X. Zhao, J. Gao, and A. Zhang. A correlated topic model using word embeddings. 2017.
[48] H.
This reference list is based on information provided by the publisher or from digital mathematics libraries. Its items are heuristically matched to zbMATH identifiers and may contain data conversion errors. In some cases that data have been complemented/enhanced by data from zbMATH Open. This attempts to reflect the references listed in the original paper as accurately as possible without claiming completeness or a perfect matching.