Supervised box clustering. (English) Zbl 1414.62275

Summary: In this work we address a technique for effectively clustering points in specific convex sets, called homogeneous boxes, having sides aligned with the coordinate axes (isothetic condition). The proposed clustering approach is based on homogeneity conditions, not according to some distance measure, and, even if it was originally developed in the context of the logical analysis of data, it is now placed inside the framework of Supervised clustering. First, we introduce the basic concepts in box geometry; then, we consider a generalized clustering algorithm based on a class of graphs, called incompatibility graphs. For supervised classification problems, we consider classifiers based on box sets, and compare the overall performances to the accuracy levels of competing methods for a wide range of real data sets. The results show that the proposed method performs comparably with other supervised learning methods in terms of accuracy.


62H30 Classification and discrimination; cluster analysis (statistical aspects)
90C27 Combinatorial optimization
90C59 Approximation methods and heuristics in mathematical programming


Full Text: DOI


[1] Awasthi P, Zadeh RB (2010) Supervised clustering. In: Lafferty JD, Williams CKI, Shawe-Taylor J, Zemel RS, Culotta A (eds) Advances in neural information processing systems 23: 24th Annual Conference on Neural Information Processing Systems 2010, Vancouver, British Columbia, Canada, 6-9 December 2010. Curran Associates, Inc., pp 91-99. http://papers.nips.cc/paper/4115-supervised-clustering.pdf
[2] Bache K, Lichman M (2013) UCI machine learning repository. http://www.ics.uci.edu/ mlearn/MLRepository.html
[3] Bárány, I.; Lehel, J., Covering with Euclidean boxes, Eur J Comb, 8, 113-119, (1987) · Zbl 0623.05041
[4] Bereg, S.; Díaz-Bánez, JM; Pérez-Lantero, P.; Ventura, I., The maximum box problem for moving points in the plane, J Comb Optim, 22, 517-530, (2011) · Zbl 1236.90096
[5] Bertolazzi, P.; Felici, G.; Festa, P.; Lancia, G., Logic classification and feature selection for biomedical data, Comput Math Appl, 55, 889-899, (2008) · Zbl 1139.92013
[6] Boros E (2010) Incompatibility graphs. In: Proceedings of workshop in graph theory and combinatorics, University of Illinois at Chicago (UIC)
[7] Boros, E.; Hammer, P.; Ibaraki, T.; Kogan, A., Logical analysis of numerical data, Math Program, 79, 163-190, (1997) · Zbl 0887.90179
[8] Boros E, Gurvich V, Liu Y (2005) Comparison of convex hulls and box hulls. Ars Comb 77 · Zbl 1157.52304
[9] Boros E, Horiyama T, Ibaraki T, Makino K, Yagiura M (2000) Finding essential attributes in binary data. In: Leung K-S, Chan L-W, Meng H (eds) IDEAL, Springer, Lecture notes in computer science, vol. 1983, pp 133-138 · Zbl 1038.68092
[10] Boros, E.; Menkov, V., Exact and approximate discrete optimization algorithms for finding useful disjunctions of categorical predicates in data analysis, Discrete Appl Math, 144, 43-58, (2004) · Zbl 1077.68089
[11] Boros E, Spinelli V, Ricca F (2011) Incompatibility graphs and data mining. In: Proceedings of the 10th Cologne-Twente workshop on graphs and combinatorial optimization. Extended Abstracts, Frascati, Italy, June 14-16, 2011, pp 4-7
[12] Carathéodory, C., Über den Variabilitätsbereich der Fourier’schen Konstanten von positiven harmonischen Funktionen, Rendiconti del Circolo Matematico di Palermo, 32, 193-217, (1911) · JFM 42.0429.01
[13] Dobkin, DP; Gunopulos, D.; Maass, W., Computing the maximum bichromatic discrepancy, with applications to computer graphics and machine learning, J Comput Syst Sci, 52, 453-470, (1996) · Zbl 0858.68077
[14] Duchet P (1987) Convexity in combinatorial structures. In: Proceedings of the 14th Winter School on Abstract Analysis, Circolo Matematico di Palermo, pp 261-293 · Zbl 0644.52001
[15] Eckstein, J.; Hammer, P.; Liu, Y.; Nediak, M.; Simeone, B., The maximum box problem and its application to data analysis, Comput Optim Appl, 23, 285-298, (2002) · Zbl 1028.90039
[16] Eick CF, Zeidat N, Zhao Z (2004) Supervised clustering—algorithms and benefits. In: Proceedings of the 16th IEEE international conference on tools with artificial intelligence, IEEE Computer Society, ICTAI ’04, pp 774-776
[17] Felici, G.; Simeone, B.; Spinelli, V.; Stahlbock, R. (ed.); Crone, SF (ed.); Lessmann, S. (ed.), Classification techniques and error control in logic mining, No. 8, 99-119, (2010), New York
[18] Gyárfás, A.; Lehel, J., Hypergraph families with bounded edge cover or transversal number, Combinatorica, 3, 351-358, (1983) · Zbl 0534.05052
[19] Haldar, C.; Patnaik, L., On movable separability and isotheticity, Inf Sci, 62, 87-102, (1992) · Zbl 0744.52001
[20] Hammer PL (2006) Optimization models for logical analysis of data. In: Proceedings of the workshop on data mining and mathematical programming. Centre de Recherches mathématiques Montréal, Québec, Canada, October 10-13, 2006
[21] Hammer, PL; Liu, Y.; Simeone, B.; Szedmák, S., Saturated systems of homogeneous boxes and the logical analysis of numerical data, Discrete Appl Math, 144, 103-109, (2004) · Zbl 1078.62503
[22] Helly, E., Über Mengen konvexer Körper mit gemeinschaftlichen Punkte, Jahresbericht der Deutschen Mathematiker-Vereinigung, 32, 175-176, (1923) · JFM 49.0534.02
[23] Kaneko A, Kano M (2003) Discrete geometry on red and blue points in the plane—a survey. In: Aronov B, Basu S, Pach J, Sharir M (eds) Discrete and computational geometry, Springer, pp 551-570 · Zbl 1079.52505
[24] Kearns, M.; Schapire, RE; Sellie, LM, Toward efficient agnostic learning, Mach Learn, 17, 115-141, (1994) · Zbl 0938.68797
[25] Leighton, F., A graph coloring algorithm for large scheduling problems, J Res Natl Bureau Stand, 84, 489-503, (1979) · Zbl 0437.68021
[26] Liu Y, Nediak M (2003) Planar case of the maximum box and related problems. In: CCCG 2003, 15th Canadian conference on computational geometry, pp 14-18
[27] Maloof M (2003) Learning when data sets are imbalanced and when costs are unequal and unknown. In: ICML-2003 workshop on learning from imbalanced data sets
[28] Maravalle, M.; Ricca, F.; Simeone, B.; Spinelli, V., Carpal tunnel syndrome automatic classification: electromyography vs. ultrasound imaging, TOP, 23, 100-123, (2014) · Zbl 1312.92027
[29] Mitchell TM (1997) Machine learning, 1st edn. McGraw-Hill Inc., New York · Zbl 0913.68167
[30] Morris, W.; Soltan, V., The erdös-szekeres problem on points in convex position a survey, Bull Am Math Soc, 37, 437-458, (2000) · Zbl 0958.52018
[31] Mugan J, Truemper K (2008) Mathematical methods for knowledge discovery and data mining, IGI Global, chap Discretization of rational data, pp 1-23
[32] Noga, A.; Füredi, Z.; Katchalski, M., Separating pairs of points by standard boxes, Eur J Comb, 6, 205-210, (1985) · Zbl 0592.05002
[33] Preparata FP, Shamos MI (1985) Computational geometry: an introduction. Springer-Verlag New York Inc., New York · Zbl 0575.68059
[34] Radon, J., Mengen konvexer Körper, die einen gemeinsamen Punkt enthalten, Mathematische Annalen, 83, 113-115, (1921) · JFM 48.0834.04
[35] Serafini, P., Classifying negative and positive points by optimal box clustering, Discrete Appl Math, 165, 270-282, (2014) · Zbl 1358.68240
[36] Simeone B, Felici G, Spinelli V (2007) A graph coloring approach for box clustering techniques in logic mining. In: Book of abstract of Euro XXII—22nd European conference on operational research, Euro XXII, p 193
[37] Simeone B, Maravalle M, Ricca F, Spinelli V (2006) Logic mining of non-logic data: some extensions of box clustering. In: Proceedings of the Euro XXI, 21st European conference on operational research. Reykjavik, Iceland, July 2-5, 2006
[38] Simeone B, Spinelli V (2007) The optimization problem framework for box clustering approach in logic mining. In: Book of abstract of Euro XXII—22nd European conference on operational research, Euro XXII, p 193
[39] Weka (2013) Machine learning group—data mining software in java. University of Waikato. http://www.cs.waikato.ac.nz/ml/weka
[40] Witten I, Frank E, Hall M (2011) Data mining: practical machine learning tools and techniques, 3rd edn. Morgan Kaufmann, San Francisco
[41] Wu S, Flach P (2005) A scored auc metric for classifier evaluation and selection. In: Proceedings of the ICML 2005 workshop on ROC Analysis in Machine Learning, Bonn, Germany, 11 Aug, 2005
[42] Zadrozny B, Langford J, Abe N (2003) Cost-sensitive learning by cost-proportionate example weighting. In: Proceedings of the 3rd IEEE international conference on data mining, p 435
This reference list is based on information provided by the publisher or from digital mathematics libraries. Its items are heuristically matched to zbMATH identifiers and may contain data conversion errors. It attempts to reflect the references listed in the original paper as accurately as possible without claiming the completeness or perfect precision of the matching.