×

Tight approximability results for test set problems in bioinformatics. (English) Zbl 1076.68113

Summary: We investigate the test set problem and its variations that appear in a variety of applications. In general, we are given a universe of objects to be “distinguished” by a family of “tests”, and we want to find the smallest sufficient collection of tests. In the simplest version, a test is a subset of the universe and two objects are distinguished by our collection if one test contains exactly one of them. Variations allow tests to be multi-valued functions or unions of “basic” tests, and different notions of the term distinguished. An important version of this problem that has applications in DNA sequence analysis has the universe consisting of strings over a small alphabet and tests that are detecting presence (or absence) of a substring. For most versions of the problem, including the latter, we establish matching lower and upper bounds on approximation ratio. When tests can be formed as unions of basic tests, we show that the problem is as hard as the graph coloring problem. We conclude by reporting preliminary computational results on the implementations of our algorithmic approaches for the minimum cost probe set problems on a data set used by Borneman et al.

MSC:

68W25 Approximation algorithms
68Q17 Computational difficulty of problems (lower bounds, completeness, difficulty of approximation, etc.)
92D20 Protein sequences, DNA sequences
PDFBibTeX XMLCite
Full Text: DOI

References:

[1] Y.S. Abu-Mostafa (Ed.), Complexity in Information Theory, Springer, Berlin, 1986.; Y.S. Abu-Mostafa (Ed.), Complexity in Information Theory, Springer, Berlin, 1986.
[2] De Bontridder, K. M.J.; Halldórsson, B. V.; Halldórsson, M. M.; Hurkens, C. A.J.; Lenstra, J. K.; Ravi, R.; Stougie, L., Approximation algorithms for the test cover problem, Math. Programming-B, 98, 1-3, 477-491 (2003) · Zbl 1160.90646
[3] Borneman, J.; Chrobak, M.; Vedova, G. D.; Figueroa, A.; Jiang, T., Probe selection algorithms with applications in the analysis of microbial communities, Bioinformatics, 17, Suppl. 1, S39-S48 (2001)
[4] Feige, U., A threshold for approximating set cover, J. ACM, 45, 634-652 (1998) · Zbl 1065.68573
[5] Feige, U.; Kilian, J., Zero knowledge and the chromatic number, J. Comput. System Sci., 57, 2, 187-199 (1998) · Zbl 0921.68089
[6] Garey, M. R.; Johnson, D. S., Computers and Intractability—A Guide to the Theory of NP-Completeness (1979), W. H. Freeman & Co.: W. H. Freeman & Co. New York · Zbl 0411.68039
[7] Gusfield, D., Algorithms on Strings, Trees and Sequences (1997), Cambridge University Press: Cambridge University Press Cambridge · Zbl 0934.68103
[8] B.V. Halldórsson, M. M. Halldórsson, R. Ravi, On the approximability of the minimum test collection problem, in: Proceedings of the Ninth Annual European Symposium on Algorithms, Lecture Notes in Computer Science, vol. 2161, 2001, pp. 158-169.; B.V. Halldórsson, M. M. Halldórsson, R. Ravi, On the approximability of the minimum test collection problem, in: Proceedings of the Ninth Annual European Symposium on Algorithms, Lecture Notes in Computer Science, vol. 2161, 2001, pp. 158-169.
[9] Johnson, D. S., Approximation algorithms for combinatorial problems, J. Comput. Systems Sci., 9, 256-278 (1974) · Zbl 0296.65036
[10] Karp, R. M.; Stoughton, R.; Yeung, K. Y., Algorithms for choosing differential gene expression experiments, (Proceedings of the Third Annual International Conference on Computational Molecular Biology (1999)), 208-217
[11] Moret, B. M.E.; Shapiro, H. D., On minimizing a set of tests, SIAM J. Sci. Statist. Comput., 6, 983-1003 (1985)
[12] Rash, S.; Gusfield, D., String barcoding: uncovering optimal virus signatures, (Proceedings of the Sixth Annual International Conference on Computational Molecular Biology (2002)), 254-261
[13] C.E. Shannon, Mathematical theory of communication, Bell Systems Tech. J. 27 (1948) 379-423, 623-658.; C.E. Shannon, Mathematical theory of communication, Bell Systems Tech. J. 27 (1948) 379-423, 623-658. · Zbl 1154.94303
[14] Slavik, P., A tight analysis of the greedy algorithm for set cover, (Proceedings of the 28th ACM Symposium on Theory of Computing (1996)), 435-439 · Zbl 0942.68773
[15] Srinivasan, A., Improved approximations of packing and covering problems, (Proceedings of the 27th Annual ACM Symposium on Theory of Computing (1995)), 268-276 · Zbl 0920.90105
This reference list is based on information provided by the publisher or from digital mathematics libraries. Its items are heuristically matched to zbMATH identifiers and may contain data conversion errors. In some cases that data have been complemented/enhanced by data from zbMATH Open. This attempts to reflect the references listed in the original paper as accurately as possible without claiming completeness or a perfect matching.