Logic classification and feature selection for biomedical data. (English) Zbl 1139.92013

Summary: We investigate logic classification and related feature selection algorithms for large biomedical data sets. When the data is in binary/logic form, the feature selection problem can be formulated as a Set Covering problem of very large dimensions, whose solution is computationally challenging. We propose an alternative approximated formulation for feature selection that results in an extension of Set Covering of compact size, and use the logic classifier Lsquare to test its performances on two well-known data sets. An ad hoc metaheuristic of the GRASP type is used to solve efficiently the feature selection problem. A simple and effective method to convert rational data into logic data by interval mapping is also described. The computational results obtained are promising and the use of logic models, that can be easily understood and integrated with other domain knowledge, is one of the major strengths of this approach.


92C40 Biochemistry, molecular biology
68T05 Learning and adaptive systems in artificial intelligence
92-08 Computational methods for problems pertaining to biology
92C50 Medical applications (general)


Full Text: DOI


[1] Tramontano, A., ()
[2] ()
[3] ()
[4] Felici, G.; Truemper, K., A minsat approach for learning in logic domains, INFORMS journal on computing, 13, 3, 1-17, (2001)
[5] Truemper, K., Design of logic-based intelligent systems, (2004), Wiley-Interscience New York · Zbl 1075.68087
[6] De Bontridder Koen, M.J.; Lageweg, B.J.; Lenstra, J.K.; Orlin, J.B.; Stougie, L., Branch-and-bound algorithms for the test cover problem, (), 737-742 · Zbl 1019.68807
[7] Liu, H.; Motoda, H., Feature selection for knowledge discovery and data mining, (2000), Kluwer Academic Publishers
[8] Langley, P., Selection of relevant features in machine learning, (), 140-144
[9] De Angelis, V.; Felici, G.; Mancinelli, G., Feature selection for data mining, (), 227-252
[10] Xing, E.P.; Jordan, M.I.; Karp, R.M., Feature selection for high-dimensional genomic microarray data, (), 601-608
[11] Ding, C.; Peng, H., Minimum redundancy feature selection from microarray gene expression data, ()
[12] Yu, L.; Liu, H., Redundancy based feature selection for microarray data, (), 737-742
[13] D.A. Peterson, M.H. Tahut, Model and feature selection in microarray classification, in: IEEE Symposium on Computational Intelligence in Bioinformatics and Computational Biology, 2004, pp. 56-60
[14] Chow, M.L.; Moler, E.J.; Mian, I.S., Identifying marker genes in transcription profiling data using a mixture of feature relevance experts, Physiological genomics, 5, 99-111, (2001)
[15] Cai, Y.; Doig, A.J., Prediction of saccharomyces cerevisae protein functional class from functional domain composition, Bioinformatics, 20, 8, 1292-1300, (2004)
[16] Shah, S.C.; Kusiak, A., Data mining and genetic algorithm based gene/snp selection, Artificial intelligence in medicine, 31, 3, 183-196, (2004)
[17] Garey, M.R.; Johnson, D.S., Computer and intractability: A guide to the theory of NP- completeness, (1979), Freeman San Francisco · Zbl 0411.68039
[18] P. Bonizzoni, G. Lancia, R. Rizzi, A practical approach to the healthy versus diseased minimal test collection problem when the tests are powerful, manuscript, 2005
[19] Feo, T.A.; Resende, M.G.C., A probabilistic heuristic for a computationally difficult set covering problem, Operations research letters, 8, 67-71, (1989) · Zbl 0675.90073
[20] Feo, T.A.; Resende, M.G.C., Greedy randomized adaptive search procedures, Journal of global optimization, 6, 109-133, (1995) · Zbl 0822.90110
[21] Resende, M.G.C.; Ribeiro, C.C., Greedy randomized adaptive search procedures, (), 219-249 · Zbl 1102.90384
[22] Festa, P.; Resende, M.G.C., Grasp: an annotated bibliography, (), 325-367 · Zbl 1017.90001
[23] Breiman; Friedman; Olshen; Stone, Classification & regression trees, (1984), Wadsworth Pacific Grove · Zbl 0541.62042
[24] Boros, E.; Ibaraki, T.; Makino, K., Logical analysis of binary data with missing bits, Artificial intelligence, 107, 219-263, (1999) · Zbl 0996.68067
[25] E. Boros, T. Ibaraki, A. Kogan, E. Mayoraz, I. Muchnik, An implementation of logical analysis of data, RUTCOR Research Report, Rutgers University, NJ, 1996, pp. 29-96
[26] Triantaphyllou, E.; Soyster, A.L., On the minimum number of logical clauses which can be inferred from examples, Computers and operations research, 23, 783-799, (1996) · Zbl 0865.68099
[27] Felici, G.; Truemper, K., (), 693-697
[28] C. Hatzis, D. Page, Kdd-2001 cup: The genomics challenge, http://www.cs.wisc.edu/dpage/kddcup2001/, 2001
[29] Golub, T.R.; Slonim, D.K.; Tamayo, P.; Huard, C.; Gaasenbeek, M.; Mesirov, J.P.; Coller, H.; Loh, M.L.; Downing, J.R.; Caligiuri, M.A.; Bloomeld, C.D.; Lander, E.S., Molecular classification of cancer: class discovery and class prediction by gene expression monitoring, Science, 286, 531-537, (1999)
This reference list is based on information provided by the publisher or from digital mathematics libraries. Its items are heuristically matched to zbMATH identifiers and may contain data conversion errors. It attempts to reflect the references listed in the original paper as accurately as possible without claiming the completeness or perfect precision of the matching.