Extended box clustering for classification problems.(English)Zbl 1392.68429

Summary: In this work we address a technique, based on elementary convex sets called box hulls, for effectively grouping finite point sets into non-convex objects, called box clusters. The proposed clustering approach is based on homogeneity conditions, not according to some distance measure, and it is situated inside the theoretical framework of Supervised clustering. This approach extends the so-called (convex) box clustering, originally developed in the context of the logical analysis of data, to non-convex geometry. We briefly discuss the topological properties of these clusters and introduce a family of hypergraphs, called incompatibility hypergraphs; the main aim for these hypergraphs is their role in clustering algorithms, even if they have strong theoretical properties as shown in other works in literature. We also discuss of supervised classification problems and generalized Voronoi diagrams are considered to define a classifier based on box clusters. Finally, computational experiments on real world data are used to show the efficacy of our methods both in terms of clustering and accuracy.

MSC:

 68U05 Computer graphics; computational geometry (digital and algorithmic aspects) 05C65 Hypergraphs 62H30 Classification and discrimination; cluster analysis (statistical aspects)

JBool
Full Text:

References:

 [1] ACHARYA, B, Domination in hypergraphs, AKCE Journal of Graphs Combinatorics, 4, 117-126, (2007) · Zbl 1136.05047 [2] AURENHAMMER, F, Voronoi diagrams - A survey of a fundamental geometric data structure, ACM Computing Surveys, 23, 345-405, (1991) [3] AWASTHI, P., and ZADEH, R.B. (2010), “Supervised Clustering”, in Advances in Neural Information Processing Systems 23, eds. J. Lafferty, C. Williams, J. Shawe-Taylor, R. Zemel, and A. Culotta, Curran Associates, Inc., pp 91-99. · Zbl 0342.68030 [4] BACHE, K., and LICHMAN, M. (2013), “UCIMachine Learning Repository”, http://www.ics.uci.edu/$$~$$mlearn/MLRepository.html [5] BÁRÁNY, I; LEHEL, J, Covering with Euclidean boxes, European Journal of Combinatorics, 8, 113-119, (1987) · Zbl 0623.05041 [6] BOISSONNAT, JD; WORMSER, C; YVINEC, M; Boissonnat, JD (ed.); Teillaud, M (ed.), Curved Voronoi diagrams, 67-116, (2006), Berlin Heidelberg · Zbl 1116.65021 [7] BOROS, E. (2010), “Incompatibility Graphs”, in Workshop in Graph Theory and Combinatorics, University of Illinois at Chicago. · Zbl 1320.62156 [8] BOROS, E; HAMMER, P; IBARAKI, T; KOGAN, A; MAYORAZ, E; MUCHNIK, I, An implementation of logical analysis of data, IEEE Transactions on Knowledge and Data Engineering, 12, 292-306, (2000) [9] BOROS, E., GURVICH, V., and LIU, Y. (2005), “Comparison of Convex Hulls and Box Hulls”, Ars Combinatoria, 77. · Zbl 1157.52304 [10] BOROS, E., RICCA, F., and SPINELLI, V. (2011), “Incompatibility Graphs in Data Mining”, in Proceedings of the 10th Cologne-Twente Workshop on Graphs and Combinatorial Optimization, pp. 4-7. · Zbl 1173.65014 [11] CARATHÉODORY, C, Über den variabilitätsbereich der fourier’schen konstanten von positiven harmonischen funktionen, Rendiconti del Circolo Matematico di Palermo, 32, 193-217, (1911) · JFM 42.0429.01 [12] CHIARANDINI, M; STÜTZLE, T; Festa, P (ed.), An analysis of heuristics for vertex colouring, No. 6049, 67-116, (2010), Berlin Heidelberg [13] CRAMA, Y., and HAMMER, P.L. (2011), Boolean Functions - Theory, Algorithms, and Applications, Encyclopedia of Mathematics and Its Applications, Vol 142, Cambridge: Cambridge University Press. · Zbl 1237.06001 [14] CRAMA, Y; HAMMER, P; IBARAKI, T, Cause-effect relationships and partially defined Boolean functions, Annals of Operations Research, 16, 299-325, (1988) · Zbl 0709.03533 [15] DINUR, I; REGEV, O; SMYTH, C, The hardness of 3-uniform hypergraph coloring, Combinatorica, 25, 519-535, (2005) · Zbl 1106.68080 [16] DOTSON, R; NAGLE, B, Hereditary properties of hypergraphs, Journal of Combinatorial Theory Series B, 99, 460-473, (2009) · Zbl 1219.05104 [17] DUCH, W, Similarity-based methods: A general framework for classification, approximation and association, Control and Cybernetics, 29, 937-968, (2000) · Zbl 0999.68198 [18] ECKSTEIN, J; HAMMER, P; LIU, Y; NEDIAK, M; SIMEONE, B, The maximum box problem and its application to data analysis, Computational Optimization and Application, 23, 285-298, (2002) · Zbl 1028.90039 [19] EICK, C.F., ZEIDAT, N., and ZHAO, Z. (2004), “Supervised Clustering - Algorithms and Benefits”, in Proceedings of the 16th IEEE International Conference on Tools with Artificial Intelligence, pp. 774-776. [20] FAYOLLE, P; PASKO, A; SCHMITT, B; MIRENKOV, N, Constructive heterogeneous object modeling using signed approximate real distance functions, Journal of Computing and Information Science in Engineering, 6, 221-229, (2005) [21] FELICI, G., SIMEONE, B., and SPINELLI, V. (2010),“Classification Techniques and Error Control in Logic Mining”, in Data Mining, Annals of Information Systems, Vol 8, eds. R. Stahlbock, S.F. Crone, and S. Lessmann, Springer, pp. 99-119. · Zbl 1219.05104 [22] GOLUBITSKY, O; MAZALOV, V; WATT, S, An algorithm to compute the distance from a point to a simplex, ACM Communications in Computer Algebra, 46, 57, (2012) [23] GOLUMBIC, M.C. (1980), Algorithmic Graph Theory and Perfect Graphs. Computer Science and Applied Mathematics, New York: Academic Press. · Zbl 0541.05054 [24] HELLY, E. (1923), “Über Mengen Konvexer Körper mit Gemeinschaftlichen Punkten”, Jahresbericht der Deutschen Mathematiker-Vereinigung, pp. 175-176. · JFM 49.0534.02 [25] KANEKO, A., and KANO, M. (2003), “Discrete Geometry on Red and Blue Points in the Plane - A Survey”, in Discrete and Computational Geometry, Springer, pp. 551-570. · Zbl 1079.52505 [26] KANTARDZIC, M, data mining: concepts, models, methods and algorithms, (2002), Inc [27] KLEIN, E; LANGETEPE, E; NILFOROUSHAN, Z, Abstract Voronoi diagrams revisited, Computational Geometry, 42, 885-902, (2009) · Zbl 1173.65014 [28] KRIVELEVICH, M; SUDAKOV, B, Approximate coloring of uniform hypergraphs, Journal of Algorithms, 49, 2-12, (2003) · Zbl 1064.68071 [29] KULIS, B, Metric learning: A survey, Foundations and Trends in Machine Learning, 5, 287-364, (2013) · Zbl 1278.68014 [30] LEIGHTON, F, A graph coloring algorithm for large scheduling problems, Journal of Research of the National Bureau of Standards, 84, 489-506, (1979) · Zbl 0437.68021 [31] MACHINE LEARNING GROUP (2013), “WEKA - Data Mining Software in Java”, University of Waikato, New Zealand, http://www.cs.waikato.ac.nz/ml/weka. [32] PREPARATA, F; HONG, S, Convex hulls of finite sets of points in two and three dimensions, Communications of the ACM, 20, 87-93, (1977) · Zbl 0342.68030 [33] RADON, J, Mengen konvexer Körper, die einen gemeinsamen punkt enthalten, Mathematische Annalen, 83, 113-115, (1921) · JFM 48.0834.04 [34] SOGAARD, A. (2013) ,Semi-Supervised Learning and Domain Adaptation in Natural Language Processing, San Rafael: Morgan and Claypool. [35] SPINELLI, V, Pruning boxes in a box-based classification method, Advances in Data Analysis and Classification, 10, 285-304, (2016) · Zbl 1414.62274 [36] SPINELLI, V, Supervised box clustering, Advances in Data Analysis and Classification, 11, 179-204, (2017) · Zbl 1414.62275 [37] WITTEN, I., FRANK, E., and HALL, M. (2011), Data Mining: Practical Machine Learning Tools and Techniques (3rd ed.), Morgan Kaufmann. · Zbl 1116.65021 [38] ZHAO, M; EDAKUNNI, N; POCOCK, A; BROWN, G, Beyond fanos inequality: bounds on the optimal F-score, ber, and cost-sensitive risk and their implications, Journal of Machine Learning Research, 14, 1033-1090, (2013) · Zbl 1320.62156 [39] ZHOU, D., HUANG, J., and SCHÖLKOPF, B. (2006), “Learning with Hypergraphs: Clustering, Classification, and Embedding”, in Advances in Neural Information Processing Systems (NIPS) 19, MIT Press, pp. 1601-1608.
This reference list is based on information provided by the publisher or from digital mathematics libraries. Its items are heuristically matched to zbMATH identifiers and may contain data conversion errors. It attempts to reflect the references listed in the original paper as accurately as possible without claiming the completeness or perfect precision of the matching.