×

zbMATH — the first resource for mathematics

Mining top-\(k\) motifs with a SAT-based framework. (English) Zbl 1404.68143
Summary: In this paper, we introduce a new problem, called Top-\(k\) SAT, that consists in enumerating the Top-\(k\) models of a propositional formula. A Top-\(k\) model is defined as a model with less than \(k\) models preferred to it with respect to a preference relation. We show that Top-\(k\) SAT generalizes two well-known problems: the Partial MAX-SAT problem and the problem of computing minimal models. Moreover, we propose a general algorithm for Top-\(k\) SAT. Then, we give an application of our declarative framework in data mining, namely, the problem of mining Top-\(k\) motifs in the transaction databases and in the sequences. In the case of mining sequence data, we introduce a new mining task by considering the sequences of itemsets. Thanks to the flexibility and to the declarative aspects of our SAT-based approach, an encoding of this task is obtained by a very slight modification of mining motifs in the sequences of items.
MSC:
68T20 Problem solving in the context of artificial intelligence (heuristics, search strategies, etc.)
68P15 Database theory
68T05 Learning and adaptive systems in artificial intelligence
Software:
CP-nets; MiniSat
PDF BibTeX XML Cite
Full Text: DOI
References:
[1] Agrawal, R.; Imielinski, T.; Swami, A. N., Mining association rules between sets of items in large databases, (ACM SIGMOD International Conference on Management of Data, (1993), ACM Press Baltimore), 207-216
[2] Tiwari, A.; Gupta, R.; Agrawal, D., A survey on frequent pattern mining: current status and challenging issues, Inf. Technol. J., 9, 1278-1293, (2010)
[3] Parida, L.; Rigoutsos, I.; Floratos, A.; Platt, D.; Gao, Y., Pattern discovery on character sets and real-valued data: linear bound on irredundant motifs and an efficient polynomial time algorithm, (ACM-SIAM Symposium on Discrete Algorithms, (2000)), 297-308 · Zbl 0956.68134
[4] Pisanti, N.; Crochemore, M.; Grossi, R.; Sagot, M. F., Bases of motifs for generating repeated patterns with wild cards, IEEE/ACM Trans. Comput. Biol. Bioinform., 2, 40-50, (2005)
[5] Arimura, H.; Uno, T., An efficient polynomial space and polynomial delay algorithm for enumeration of maximal motifs in a sequence, J. Comb. Optim., 13, 243-262, (2007) · Zbl 1123.68134
[6] Fu, A. W.-C.; Kwong, R. W.W.; Tang, J., Mining n-most interesting itemsets, (Proceedings of the 12th International Symposium on Methodologies for Intelligent Systems, ISMIS 2000, Lecture Notes in Computer Science, (2000), Springer), 59-67 · Zbl 0983.68669
[7] Han, J.; Wang, J.; Lu, Y.; Tzvetkov, P., Mining top-k frequent closed patterns without minimum support, (Proceedings of the 2002 IEEE International Conference on Data Mining, ICDM 2002, (2002), IEEE Computer Society), 211-218
[8] Ke, Y.; Cheng, J.; Yu, J. X., Top-k correlative graph mining, (Proceedings of the SIAM International Conference on Data Mining, SDM 2009, (2009)), 1038-1049
[9] Valari, E.; Kontaki, M.; Papadopoulos, A. N., Discovery of top-k dense subgraphs in dynamic graph collections, (Proceedings of the 24th International Conference on Scientific and Statistical Database Management, SSDBM 2012, (2012)), 213-230
[10] Lam, H. T.; Calders, T., Mining top-k frequent items in a data stream with flexible sliding windows, (Proceedings of the 16th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, KDD 2010, (2010)), 283-292
[11] Lam, H. T.; Calders, T.; Pham, N., Online discovery of top-k similar motifs in time series data, (Proceedings of the Eleventh SIAM International Conference on Data Mining, SDM 2011, (2011)), 1004-1015
[12] Shoham, Y., Reasoning about change: time and causation from the standpoint of artificial intelligence, (1988), MIT Press Cambridge, MA, USA
[13] Meseguer, P.; Rossi, F.; Schiex, T., Soft constraints, (Rossi, F.; van Beek, P.; Walsh, T., Handbook of Constraint Programming, (2006), Elsevier)
[14] Boutilier, C.; Brafman, R. I.; Domshlak, C.; Poole, D. L.; Hoos, H. H., CP-nets: a tool for representing and reasoning with conditional ceteris paribus preference statements, J. Artif. Intell. Res., 21, 135-191, (2004) · Zbl 1080.68685
[15] Walsh, T., Representing and reasoning with preferences, AI Mag., 28, 59-70, (2007)
[16] Brafman, R. I.; Domshlak, C., Preference handling - an introductory tutorial, AI Mag., 30, 58-86, (2009)
[17] Domshlak, C.; Hüllermeier, E.; Kaci, S.; Prade, H., Preferences in AI: an overview, Artif. Intell., 175, 1037-1052, (2011)
[18] Bistarelli, S.; Bonchi, F., Soft constraint based pattern mining, Data Knowl. Eng., 62, 118-137, (2007)
[19] de Amo, S.; Diallo, M. S.; Diop, C. T.; Giacometti, A.; Li, H. D.; Soulet, A., Mining contextual preference rules for building user profiles, (Proceedings of the 14th International Conference on Data Warehousing and Knowledge Discovery, DaWaK’12, (2012)), 229-242
[20] Ugarte Rojas, W.; Boizumault, P.; Loudni, S.; Crémilleux, B.; Lepailleur, A., Mining (soft-) skypatterns using dynamic CSP, (Integration of AI and OR Techniques in Constraint Programming, CPAIOR’14, (2014)), 71-87 · Zbl 1407.68416
[21] Négrevergne, B.; Dries, A.; Guns, T.; Nijssen, S., Dominance programming for itemset mining, (2013 IEEE 13th International Conference on Data Mining, Dallas, TX, USA, December 7-10, 2013, (2013)), 557-566
[22] Rosa, E. D.; Giunchiglia, E.; Maratea, M., Solving satisfiability problems with preferences, Constraints, 15, 485-515, (2010) · Zbl 1208.68199
[23] Castell, T.; Cayrol, C.; Cayrol, M.; Berre, D. L., Using the Davis and Putnam procedure for an efficient computation of preferred models, (ECAI, (1996)), 350-354
[24] Raedt, L. D.; Guns, T.; Nijssen, S., Constraint programming for itemset mining, (ACM SIGKDD, (2008)), 204-212
[25] Guns, T.; Nijssen, S.; Raedt, L. D., Itemset mining: a constraint programming perspective, Artif. Intell., 175, 1951-1983, (2011) · Zbl 1353.68233
[26] Jabbour, S.; Sais, L.; Salhi, Y., Boolean satisfiability for sequence mining, (CIKM, (2013)), 649-658
[27] Cambazard, H.; Hadzic, T.; O’Sullivan, B., Knowledge compilation for itemset mining, (ECAI’10, (2010)), 1109-1110
[28] Khiari, M.; Boizumault, P.; Crémilleux, B., Combining CSP and constraint-based mining for pattern discovery, (Computational Science and Its Applications, ICCSA 2010, (2010)), 432-447
[29] Metivier, J.-P.; Boizumault, P.; Crémilleux, B.; Khiari, M.; Loudni, S., A constraint-based language for declarative pattern discovery, (IEEE 11th International Conference on Data Mining Workshops, ICDMW, Vancouver, Canada, (2011)), 1112-1119
[30] Guns, T.; Dries, A.; Tack, G.; Nijssen, S.; De Raedt, L., Miningzinc: a modeling language for constraint-based mining, (Proceedings of the Twenty-Third International Joint Conference on Artificial Intelligence, IJCAI’13, (2013)), 1365-1372
[31] Guns, T.; Nijssen, S.; De Raedt, L., Itemset mining: a constraint programming perspective, Artif. Intell., 175, 1951-1983, (2011) · Zbl 1353.68233
[32] Wang, J.; Han, J.; Lu, Y.; Tzvetkov, P., TFP: an efficient algorithm for mining top-k frequent closed itemsets, IEEE Trans. Knowl. Data Eng., 17, 652-664, (2005)
[33] Agrawal, R.; Srikant, R., Mining sequential patterns, (Philip, A. L.P. C.; Yu, S., Proceedings of the Eleventh International Conference on Data Engineering, ICDE’1995, (1995), IEEE Computer Society), 3-14
[34] Jabbour, S.; Sais, L.; Salhi, Y., The top-k frequent closed itemset mining using top-k SAT problem, (ECML/PKDD (3), (2013)), 403-418
[35] Tseitin, G., On the complexity of derivations in the propositional calculus, (Structures in Constructives Mathematics and Mathematical Logic, Part II, (1968)), 115-125 · Zbl 0197.00102
[36] Fu, Z.; Malik, S., On solving the partial MAX-SAT problem, (Proceedings of the Ninth International Conference on Theory and Applications of Satisfiability Testing, SAT’06, (2006)), 252-265 · Zbl 1187.68540
[37] Warners, J. P., A linear-time transformation of linear inequalities into conjunctive normal form, Inf. Process. Lett., (1996)
[38] Sinz, C., Towards an optimal CNF encoding of Boolean cardinality constraints, (11th International Conference on Principles and Practice of Constraint Programming, CP 2005, (2005)), 827-831 · Zbl 1153.68488
[39] Eén, N.; Sörensson, N., Translating pseudo-Boolean constraints into SAT, J. Satisf. Boolean Model. Comput., 2, 1-26, (2006) · Zbl 1116.68083
[40] Asin, R.; Nieuwenhuis, R.; Oliveras, A.; Rodriguez-Carbonell, E., Cardinality networks: a theoretical and empirical study, Constraints, 16, 195-221, (2011) · Zbl 1217.68200
[41] Jabbour, S.; Saïs, L.; Salhi, Y., A pigeon-hole based encoding of cardinality constraints, (International Symposium on Artificial Intelligence and Mathematics, ISAIM 2014, Fort Lauderdale, FL, USA, January 6-8, 2014, (2014))
[42] Cadoli, M., On the complexity of model finding for nonmonotonic propositional logics, (4th Italian Conference on Theoretical Computer Science, (1992)), 125-139
[43] Morgado, A. R.; Marques-Silva, J. P., Good learning and implicit model enumeration, (International Conference on Tools with Artificial Intelligence, ICTAI’2005, (2005), IEEE), 131-136
[44] Egho, E.; Raïssi, C.; Calders, T.; Jay, N.; Napoli, A., On measuring similarity for sequences of itemsets, Data Min. Knowl. Discov., 29, 732-764, (2015) · Zbl 1405.68462
[45] Wu, Y.; Wang, L.; Ren, J.; Ding, W.; Wu, X., Mining sequential patterns with periodic wildcard gaps, Appl. Intell., 41, 99-116, (2014)
[46] Jabbour, S.; Sais, L.; Salhi, Y., On SAT models enumeration in itemset mining, (2015), CoRR
[47] Lane, T., Filtering techniques for rapid user classification, (AAAI-98/ICML-98 Joint Workshop on AI Approaches to Time-Series Analysis, (1998)), 58-63
[48] Uno, T.; Kiyomi, M.; Arimura, H., LCM ver. 2: efficient mining algorithms for frequent/closed/maximal itemsets, (Proceedings of the IEEE ICDM Workshop on Frequent Itemset Mining Implementations, FIMI’04, Brighton, UK, November 1, 2004, (2004))
[49] Williams, R.; Gomes, C. P.; Selman, B., Backdoors to typical case complexity, (Proceedings of the Eighteenth International Joint Conference on Artificial Intelligence, (2003)), 1173-1178
This reference list is based on information provided by the publisher or from digital mathematics libraries. Its items are heuristically matched to zbMATH identifiers and may contain data conversion errors. It attempts to reflect the references listed in the original paper as accurately as possible without claiming the completeness or perfect precision of the matching.