×

MiningZinc: a declarative framework for constraint-based mining. (English) Zbl 1404.68106

Summary: We introduce MiningZinc, a declarative framework for constraint-based data mining. MiningZinc consists of two key components: a language component and an execution mechanism.
First, the MiningZinc language allows for high-level and natural modeling of mining problems, so that MiningZinc models are similar to the mathematical definitions used in the literature. It is inspired by the Zinc family of languages and systems and supports user-defined constraints and functions.
Secondly, the MiningZinc execution mechanism specifies how to compute solutions for the models. It is solver independent and supports both standard constraint solvers and specialized data mining systems. The high-level problem specification is first translated into a normalized constraint language (FlatZinc). Rewrite rules are then used to add redundant constraints or solve subproblems using specialized data mining algorithms or generic constraint programming solvers. Given a model, different execution strategies are automatically extracted that correspond to different sequences of algorithms to run. Optimized data mining algorithms, specialized processing routines and generic solvers can all be automatically combined.
Thus, the MiningZinc language allows one to model constraint-based itemset mining problems in a solver independent way, and its execution mechanism can automatically chain different algorithms and solvers. This leads to a unique combination of declarative modeling with high-performance solving.

MSC:

68T05 Learning and adaptive systems in artificial intelligence
PDFBibTeX XMLCite
Full Text: DOI Link

References:

[1] De Raedt, L.; Nijssen, S.; O’Sullivan, B.; Van Hentenryck, P., Constraint programming meets machine learning and data mining (Dagstuhl seminar 11201), Dagstuhl Rep., 1, 5, 61-83 (2011)
[2] Han, J.; Kamber, M., Data Mining: Concepts and Techniques (2000), Morgan Kaufmann
[3] Marriott, K.; Nethercote, N.; Rafeh, R.; Stuckey, P. J.; Garcia De La Banda, M.; Wallace, M., The design of the Zinc modelling language, Constraints, 13, 3, 229-267 (2008) · Zbl 1146.68352
[4] Frisch, A.; Harvey, W.; Jefferson, C.; Hernández, B. M.; Miguel, I., Essence: a constraint language for specifying combinatorial problems, Constraints, 13, 3, 268-306 (2008) · Zbl 1147.68424
[5] Van Hentenryck, P., The OPL Optimization Programming Language (1999), MIT Press
[6] Nethercote, N.; Stuckey, P. J.; Becket, R.; Brand, S.; Duck, G. J.; Tack, G., MiniZinc: towards a standard CP modelling language, (CP. CP, LNCS, vol. 4741 (2007), Springer), 529-543
[7] Stuckey, P.; Tack, G., MiniZinc with functions, (Integration of AI and OR Techniques in Constraint Programming for Combinatorial Optimization Problems. Integration of AI and OR Techniques in Constraint Programming for Combinatorial Optimization Problems, LNCS, vol. 7874 (2013), Springer: Springer Berlin, Heidelberg), 268-283 · Zbl 1382.68234
[8] Guns, T.; Nijssen, S.; De Raedt, L., Itemset mining: a constraint programming perspective, Artif. Intell., 175, 12-13, 1951-1983 (2011) · Zbl 1353.68233
[9] Guns, T.; Dries, A.; Tack, G.; Nijssen, S.; De Raedt, L., MiningZinc: a modeling language for constraint-based mining, (Proceedings of the Twenty-Third International Joint Conference on Artificial Intelligence (2013), AAAI Press), 1365-1372
[10] Agrawal, R.; Imielinski, T.; Swami, A. N., Mining association rules between sets of items in large databases, (Proceedings of the 1993 ACM SIGMOD International Conference on Management of Data (1993), ACM Press), 207-216
[11] (Aggarwal, C. C.; Han, J., Frequent Pattern Mining (2014), Springer) · Zbl 1297.68010
[12] Mannila, H.; Toivonen, H., Levelwise search and borders of theories in knowledge discovery, Data Min. Knowl. Discov., 1, 3, 241-258 (1997)
[13] Bonchi, F.; Lucchese, C., Extending the state-of-the-art of constraint-based pattern discovery, Data Knowl. Eng., 60, 2, 377-399 (2007)
[14] Boulicaut, J.-F.; Jeudy, B., Constraint-based data mining, (Data Mining and Knowledge Discovery Handbook (2010), Springer), 339-354
[15] Rossi, F.; Beek, P.v.; Walsh, T., Handbook of Constraint Programming, Foundations of Artificial Intelligence (2006), Elsevier Science Inc.: Elsevier Science Inc. New York, NY, USA · Zbl 1175.90011
[16] Stuckey, P. J.; Becket, R.; Fischer, J., Philosophy of the MiniZinc challenge, Constraints, 15, 3, 307-316 (2010) · Zbl 1208.68207
[17] Vreeken, J.; Leeuwen, M.; Siebes, A., Krimp: mining itemsets that compress, Data Min. Knowl. Discov., 23, 1, 169-214 (2011) · Zbl 1235.68071
[18] Pasquier, N.; Bastide, Y.; Taouil, R.; Lakhal, L., Discovering frequent closed itemsets for association rules, (Database Theory. Database Theory, LNCS, vol. 1540 (1999), Springer), 398-416
[19] Chan, R.; Yang, Q.; Shen, Y.-D., Mining high utility itemsets, (ICDM (2003)), 19-26
[20] Chui, C. K.; Kao, B.; Hung, E., Mining frequent itemsets from uncertain data, (Advances in Knowledge Discovery and Data Mining, Proceedings of 11th Pacific-Asia Conference. Advances in Knowledge Discovery and Data Mining, Proceedings of 11th Pacific-Asia Conference, PAKDD 2007, Nanjing, China, May 22-25, 2007 (2007)), 47-58
[21] Novak, P. K.; Lavrac, N.; Webb, G. I., Supervised descriptive rule discovery: a unifying survey of contrast set, emerging pattern and subgroup mining, J. Mach. Learn. Res., 10, 377-403 (2009) · Zbl 1235.68178
[22] Fürnkranz, J.; Flach, P. A., ROC ‘n’ rule learning - towards a better understanding of covering algorithms, Mach. Learn., 58, 1, 39-77 (2005) · Zbl 1075.68071
[23] Nijssen, S.; Jimenez, A.; Guns, T., Constraint-based pattern mining in multi-relational databases, (Proceedings of the 11th IEEE International Conference on Data Mining Workshops (2011), IEEE), 1120-1127
[24] Guns, T.; Nijssen, S.; De Raedt, L., k-Pattern set mining under constraints, IEEE Trans. Knowl. Data Eng., 25, 2, 402-418 (2013)
[25] Khiari, M.; Boizumault, P.; Crémilleux, B., A generic approach for modeling and mining n-ary patterns, (ISMIS (2011)), 300-305
[26] Uno, T.; Kiyomi, M.; Arimura, H., LCM ver. 2: efficient mining algorithms for frequent/closed/maximal itemsets, (FIMI. FIMI, CEUR-WS.org. FIMI. FIMI, CEUR-WS.org, CEUR Workshop Proceedings, vol. 126 (2004))
[27] Elmasri, R.; Navathe, S., Fundamentals of Database Systems (2010), Addison-Wesley Publishing Company: Addison-Wesley Publishing Company USA
[28] Kotthoff, L., Algorithm selection for combinatorial search problems: a survey, AI Mag., 35, 3, 48-60 (2014)
[29] Schulte, C.; Tack, G.; Lagerkvist, M., Gecode, a generic constraint development environment (2013)
[30] Opturion, CPX (2014)
[31] van Omme, N.; Perron, L.; Furnon, V., or-tools user’s manual (2014), Google, Tech. rep.
[32] Borgelt, C., Frequent item set mining, WIREs Data Min. Knowl. Discov., 2, 6, 437-456 (2012)
[33] Liu, M.; Qu, J., Mining high utility itemsets without candidate generation, (Proceedings of the 21st ACM International Conference on Information and Knowledge Management (2012), ACM), 55-64
[34] Fournier-Viger, P.; Gomariz, A.; Soltani, A.; Gueniche, T., Spmf: open-source data mining platform (2013)
[35] Nijssen, S.; Guns, T.; De Raedt, L., Correlated itemset mining in ROC space: a constraint programming approach, (KDD (2009), ACM), 647-656
[36] Frank, A.; Asuncion, A., UCI machine learning repository (2010), available from
[37] Goethals, B.; Zaki, M. J., Advances in frequent itemset mining implementations: report on FIMI’03, (SIGKDD Explorations, vol. 6 (2004)), 109-117
[38] Mannila, H., Inductive databases and condensed representations for data mining, (ILPS (1997)), 21-30
[39] Meo, R.; Psaila, G.; Ceri, S., A new SQL-like operator for mining association rules, (VLDB (1996)), 122-133
[40] Imielinski, T.; Virmani, A., MSQL: a query language for database mining, Data Min. Knowl. Discov., 3, 373-408 (1999)
[41] Blockeel, H.; Calders, T.; Fromont, É.; Goethals, B.; Prado, A.; Robardet, C., An inductive database system based on virtual mining views, Data Min. Knowl. Discov., 24, 1, 247-287 (2012) · Zbl 1235.68064
[42] Järvisalo, M., Itemset mining as a challenge application for answer set enumeration, (Logic Programming and Nonmonotonic Reasoning. Logic Programming and Nonmonotonic Reasoning, LNCS, vol. 6645 (2011), Springer), 304-310
[43] Métivier, J.-P.; Boizumault, P.; Crémilleux, B.; Khiari, M.; Loudni, S., A constraint language for declarative pattern discovery, (ACM Symposium on Applied Computing (2012), ACM), 119-125
[44] Henriques, R.; Lynce, I.; Manquinho, V. M., On when and how to use SAT to mine frequent itemsets, CoRR
[45] Coquery, E.; Jabbour, S.; Sais, L.; Salhi, Y., A SAT-based approach for discovering frequent, closed and maximal patterns in a sequence, (European Conference on Artificial Intelligence. European Conference on Artificial Intelligence, ECAI, vol. 242 (2012)), 258-263 · Zbl 1327.68213
[46] Järvisalo, M., Itemset mining as a challenge application for answer set enumeration, (Logic Programming and Nonmonotonic Reasoning, Proceedings of 11th International Conference. Logic Programming and Nonmonotonic Reasoning, Proceedings of 11th International Conference, LPNMR 2011, Vancouver, Canada, May 16-19, 2011 (2011)), 304-310
[47] Jabbour, S.; Sais, L.; Salhi, Y., Boolean satisfiability for sequence mining, (Proceedings of the 22nd ACM International Conference on Information & Knowledge Management (2013), ACM), 649-658
[48] Coquery, E.; Jabbour, S.; Sais, L., A constraint programming approach for enumerating motifs in a sequence, (IEEE 11th International Conference on Data Mining Workshops. IEEE 11th International Conference on Data Mining Workshops, ICDMW 2011 (2011), IEEE), 1091-1097
[49] Kemmar, A.; Ugarte, W.; Loudni, S.; Charnois, T.; Lebbah, Y.; Boizumault, P.; Cremilleux, B., Mining relevant sequence patterns with CP-based framework, (IEEE 25th International Conference on Tools with Artificial Intelligence. IEEE 25th International Conference on Tools with Artificial Intelligence, ICTAI 2013 (2014), IEEE Computer Society), 552-559
[50] Métivier, J.-P.; Loudni, S.; Charnois, T., A constraint programming approach for mining sequential patterns in a sequence database, (ECML/PKDD 2013 Workshop on Languages for Data Mining and Machine Learning (2013)), also available as
[51] Jabbour, S.; Sais, L.; Salhi, Y., The top-k frequent closed itemset mining using top-k SAT problem, (Machine Learning and Knowledge Discovery in Databases (2013), Springer), 403-418
[52] Ugarte, W.; Boizumault, P.; Loudni, S.; Crémilleux, B.; Lepailleur, A., Soft threshold constraints for pattern mining, (Eleventh International Conference on Integration of Artificial Intelligence (AI) and Operations Research (OR) Techniques in Constraint Programming. Eleventh International Conference on Integration of Artificial Intelligence (AI) and Operations Research (OR) Techniques in Constraint Programming, CPAIOR-14 (2014))
[53] Negrevergne, B.; Dries, A.; Guns, T.; Nijssen, S., Dominance programming for itemset mining, (13th IEEE International Conference on Data Mining (2013), IEEE Computer Society), 557-566
[54] Métivier, J.-P.; Boizumault, P.; Crémilleux, B.; Khiari, M.; Loudni, S., Constrained clustering using SAT, (Advances in Intelligent Data Analysis XI (2012), Springer), 207-218
[55] Duong, K.-C.; Vrain, C., A declarative framework for constrained clustering, (ECML/PKDD, Machine Learning and Knowledge Discovery in Databases (2013), Springer), 419-434
[56] Van Hentenryck, P.; Michel, L., Constraint-Based Local Search (2005), MIT Press
[57] Flener, P.; Pearson, J.; Ågren, M., Introducing ESRA, a relational language for modelling combinatorial problems, (LOPSTR. LOPSTR, LNCS, vol. 3018 (2003), Springer), 214-232 · Zbl 1099.68543
[58] Frisch, A.; Jefferson, C.; Hernández, B. M.; Miguel, I., The rules of constraint modelling, (IJCAI (2005), Professional Book Center), 109-116
[59] Frisch, A., International workshops on constraint modelling and reformulation
[61] Huang, J., Universal Booleanization of constraint models, (Stuckey, P., Principles and Practice of Constraint Programming. Principles and Practice of Constraint Programming, LNCS, vol. 5202 (2008), Springer: Springer Berlin, Heidelberg), 144-158 · Zbl 1149.68307
[62] Bofill, M.; Palahí, M.; Suy, J.; Villaret, M., fzn2smt (2011)
[63] Duck, G. J.; De Koninck, L.; Stuckey, P. J., Cadmium: an implementation of ACD term rewriting, (ICLP. ICLP, LNCS, vol. 5366 (2008), Springer), 531-545 · Zbl 1185.68137
[64] Frühwirth, T., Constraint Handling Rules (2009), Cambridge University Press · Zbl 1182.68039
[65] Ajili, F.; Wallace, M., Hybrid problem solving in eclipse, (Milano, M., Constraint and Integer Programming. Constraint and Integer Programming, Operations Research/Computer Science Interfaces Series, vol. 27 (2004), Springer), 169-206 · Zbl 1078.90563
[66] Focacci, F.; Lodi, A.; Milano, M., Exploiting relaxations in cp, (Constraint and Integer Programming. Constraint and Integer Programming, Operations Research/Computer Science Interfaces Series, vol. 27 (2004), Springer), 137-167 · Zbl 1078.90554
[67] Achterberg, T.; Berthold, T.; Koch, T.; Wolter, K., Constraint integer programming: a new approach to integrate cp and mip, (Perron, L.; Trick, M. A., Integration of AI and OR Techniques in Constraint Programming for Combinatorial Optimization Problems. Integration of AI and OR Techniques in Constraint Programming for Combinatorial Optimization Problems, LNCS, vol. 5015 (2008), Springer: Springer Berlin, Heidelberg), 6-20 · Zbl 1142.68504
[68] Wallace, M., Hybrid algorithms in constraint programming, (Azevedo, F.; Barahona, P.; Fages, F.; Rossi, F., Recent Advances in Constraints. Recent Advances in Constraints, LNCS, vol. 4651 (2007), Springer: Springer Berlin, Heidelberg), 1-32 · Zbl 1176.68204
This reference list is based on information provided by the publisher or from digital mathematics libraries. Its items are heuristically matched to zbMATH identifiers and may contain data conversion errors. In some cases that data have been complemented/enhanced by data from zbMATH Open. This attempts to reflect the references listed in the original paper as accurately as possible without claiming completeness or a perfect matching.