×

Classification from triplet comparison data. (English) Zbl 1468.68150

Summary: Learning from triplet comparison data has been extensively studied in the context of metric learning, where we want to learn a distance metric between two instances, and ordinal embedding, where we want to learn an embedding in a Euclidean space of the given instances that preserve the comparison order as much as possible. Unlike fully labeled data, triplet comparison data can be collected in a more accurate and human-friendly way. Although learning from triplet comparison data has been considered in many applications, an important fundamental question of whether we can learn a classifier only from triplet comparison data without all the labels has remained unanswered. In this letter, we give a positive answer to this important question by proposing an unbiased estimator for the classification risk under the empirical risk minimization framework. Since the proposed method is based on the empirical risk minimization framework, it inherently has the advantage that any surrogate loss function and any model, including neural networks, can be easily applied. Furthermore, we theoretically establish an estimation error bound for the proposed empirical risk minimizer. Finally, we provide experimental results to show that our method empirically works well and outperforms various baseline methods.

MSC:

68T05 Learning and adaptive systems in artificial intelligence
62G05 Nonparametric estimation
62H30 Classification and discrimination; cluster analysis (statistical aspects)
PDFBibTeX XMLCite
Full Text: DOI arXiv

References:

[1] Agarwal, S., Wills, J., Cayton, L., Lanckriet, G., Kriegman, D., & Belongie, S. (2007). Generalized non-metric multidimensional scaling. In Proceedings of the Eleventh International Conference on Artificial Intelligence and Statistics (pp. 11-18).
[2] Anderton, J., & Aslam, J. (2019). Scaling up ordinal embedding: A landmark approach. In Proceedings of the 36th International Conference on Machine Learning (pp. 282-290).
[3] Asuncion, A., & Newman, D. (2007). UCI machine learning repository.Irvine, CA: University of California, Irvine. http://archive.ics.uci.edu/ml
[4] Bao, H., Niu, G., & Sugiyama, M. (2018). Classification from pairwise similarity and unlabeled data. In Proceedings of the 35th International Conference on Machine Learning (pp. 452-461).
[5] Davis, J. V., Kulis, B., Jain, P., Sra, S., & Dhillon, I. S. (2007). Information-theoretic metric learning. In Proceedings of the 24th International Conference on Machine Learning (pp. 209-216).
[6] du Plessis, M. C., Niu, G., & Sugiyama, M. (2014). Analysis of learning from positive and unlabeled data. In Z. Ghahramani, M. Welling, C. Cortes, N. D. Lawrence, & K. Q. Weinberger (Eds.), Advances in neural information processing systems, 27 (pp. 703-711). Red Hook, NY: Curran.
[7] Fürnkranz, J., & Hüllermeier, E. (2010). Preference learning. Berlin: Springer. · Zbl 1214.68285
[8] Haghiri, S., Garreau, D., & Luxburg, U. (2018). Comparison-based random forests. In Proceedings of the 35th International Conference on Machine Learning (pp. 1871-1880).
[9] Haghiri, S., Ghoshdastidar, D., & von Luxburg, U. (2017). Comparison based nearest neighbor search. In Proceedings of the 20th International Conference on Artificial Intelligence and Statistics (pp. 851-859).
[10] Heim, E. (2016). Efficiently and effectively learning models of similarity from human feedback. PhD diss., University of Pittsburgh.
[11] Kingma, D. P., & Ba, J. (2014). Adam: A method for stochastic optimization. In Proceedings of the 3rd International Conference on Learning Representations. https://openreview.net/group?id=ICLR.cc
[12] Kleindessner, M. (2017). Machine learning in a setting of ordinal distance information. PhD diss., Eberhard Karls Universität Tübingen.
[13] Kleindessner, M., & von Luxburg, U. (2017a). Kernel functions based on triplet comparisons. In I. Guyon, U. V. Luxburg, S. Bengio, H. Wallach, R. Fergus, S. Vishwanathan, & R. Garnett (Eds.), Advances in neural information processing systems, 30 (pp. 6807-6817). Red Hook, NY: Curran.
[14] Kleindessner, M., & von Luxburg, U. (2017b). Lens depth function and k-relative neighborhood graph: Versatile tools for ordinal data analysis. Journal of Machine Learning Research, 18(1), 1889-1940. · Zbl 1440.62400
[15] Krizhevsky, A., & Hinton, G. (2009). Learning multiple layers of features from tiny images (Technical Report Vol. 1, no. 4). Toronto: University of Toronto.
[16] LeCun, Y., Bottou, L., Bengio, Y., & Haffner, P. (1998). Gradient-based learning applied to document recognition. Proceedings of the IEEE, 86(11), 2278-2324. ,
[17] Liu, C., Wu, K., & He, T. (2004). Sensor localization with ring overlapping based on comparison of received signal strength indicator. In Proceedings of the 2004 IEEE International Conference on Mobile Ad-hoc and Sensor Systems (pp. 516-518). Piscataway, NJ: IEEE.
[18] Lu, N., Niu, G., Menon, A. K., & Sugiyama, M. (2019). On the minimal supervision for training any binary classifier from only unlabeled data. In Proceedings of the International Conference on Learning Representations 2019. https://openreview.net/group?id=ICLR.cc
[19] Macqueen, J. (1967). Some methods for classification and analysis of multivariate observations. In Proceedings of the Fifth Berkeley Symposium on Mathematical Statistics and Probability (pp. 281-297). Berkeley: University of California, Berkeley. · Zbl 0214.46201
[20] Mojsilovic, S., & Ukkonen, A. (2019). Relative distance comparisons with confidence judgements. In Proceedings of the 2019 SIAM International Conference on Data Mining (pp. 459-467). Philadelphia: SIAM. ,
[21] Moore, E. H. (1920). On the reciprocal of the general algebraic matrix. Bull. Am. Math. Soc., 26, 394-395.
[22] Nair, V., & Hinton, G. E. (2010). Rectified linear units improve restricted Boltzmann machines. In Proceedings of the 27th International Conference on Machine Learning (pp. 807-814).
[23] Narasimhan, H., & Agarwal, S. (2013). On the relationship between binary classification, bipartite ranking, and binary class probability estimation. In C. J. C. Burges, L. Bottou, M. Welling, Z. Ghahramani, & K. Q. Weinberger (Eds.), Advances in neural information processing systems, 26 (pp. 2913-2921). Red Hook, NY: Curran.
[24] Niu, G., Dai, B., Yamada, M., & Sugiyama, M. (2014). Information-theoretic semi-supervised metric learning via entropy regularization. Neural Computation, 26(8), 1717-1762. , · Zbl 1415.68173
[25] Niu, G., du Plessis, M. C., Sakai, T., Ma, Y., & Sugiyama, M. (2016). Theoretical comparisons of positive-unlabeled learning against positive-negative learning. In NeurIPS (pp. 1199-1207).
[26] Penrose, B. Y. R. (1955). A generalized inverse for matrices.Mathematical Proceedings of the Cambridge Philosophical Society, 51 (3), 406-413. , · Zbl 0065.24603
[27] Perrot, M., & von Luxburg, U. (2018). Boosting for comparison-based learning. In Proceedings of the Twenty-Eighth International Joint Conference on Artificial Intelligence (pp. 1844-1850). Palo Alto, CA: AAAI Press.
[28] Schroff, F., Kalenichenko, D., & Philbin, J. (2015). Facenet: A unified embedding for face recognition and clustering. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (pp. 815-823). Piscataway, NJ: IEEE.
[29] Schultz, M., & Joachims, T. (2004). Learning a distance metric from relative comparisons. In L. K. Saul, Y. Weiss, & L. Bottou (Eds.), Advances in neural information processing systems, 17 (pp. 41-48). Red Hook, NY: Curran.
[30] Shimada, T., Bao, H., Sato, I., & Sugiyama, M. (2019). Classification from pairwise similarities/dissimilarities and unlabeled data via empirical risk minimization. arXiv:1904.11717.
[31] Stewart, N., Brown, G. D., & Chater, N. (2005). Absolute identification by relative judgment. Psychological Review, 112(4), 881. ,
[32] Sugiyama, M. (2012). Learning under non-stationarity: Covariate shift adaptation by importance weighting. In J. E. Gontle, W. Härdle, & Y. Mori (Eds.), Handbook of computational statistics (pp. 927-952). Berlin: Singer. ,
[33] Van Der Maaten, L., & Weinberger, K. (2012). Stochastic triplet embedding. In Proceedings of the 2012 IEEE International Workshop on Machine Learning for Signal Processing (pp. 1-6). Piscataway, NJ: IEEE. ,
[34] Vapnik, V. N. (1995). The nature of statistical learning theory. Berlin: Springer-Verlag. , · Zbl 0833.62008
[35] Xiao, H., Rasul, K., & Vollgraf, R. (2017). Fashion-MNIST: A novel image dataset for benchmarking machine learning algorithms. arXiv:1708.07747.
[36] Xiao, L., Li, R., & Luo, J. (2006). Sensor localization based on nonmetric multidimensional scaling.STRESS2:1.
[37] Yu, B., Liu, T., Gong, M., Ding, C., & Tao, D. (2018). Correcting the triplet selection bias for triplet loss. In Proceedings of the European Conference on Computer Vision (pp. 71-87). Berlin: Springer. ,
[38] Zhou, Z. H. (2017). A brief introduction to weakly supervised learning. National Science Review, 5(1), 44-53. ,
This reference list is based on information provided by the publisher or from digital mathematics libraries. Its items are heuristically matched to zbMATH identifiers and may contain data conversion errors. In some cases that data have been complemented/enhanced by data from zbMATH Open. This attempts to reflect the references listed in the original paper as accurately as possible without claiming completeness or a perfect matching.