×

Fast algorithms for finding disjoint subsequences with extremal densities. (English) Zbl 1106.68396

Summary: We derive fast algorithms for the following problem: given a set of \(n\) points on the real line and two parameters \(s\) and \(p\), find \(s\) disjoint intervals of maximum total length that contain at most \(p\) of the given points. Our main contribution consists of algorithms whose time bounds improve upon a straightforward dynamic programming algorithm, in the relevant case that input size \(n\) is much bigger than parameters \(s\) and \(p\). These results are achieved by selecting a few candidate intervals that are provably sufficient for building an optimal solution via dynamic programming. As a byproduct of this idea we improve an algorithm for a similar subsequence problem of Y. H. Chen, H. I. Lu and C. Y. Tang [Lect. Notes Comput. Sci. 3515, 845–850 (2005; Zbl 1106.68397)]. The problems are motivated by the search for significant patterns in biological data. Finally, we propose several heuristics that further reduce the time complexity in typical instances. One of them leads to an apparently open subsequence sum problem of independent interest.

MSC:

68T10 Pattern recognition, speech recognition
90C39 Dynamic programming

Citations:

Zbl 1106.68397
PDFBibTeX XMLCite
Full Text: DOI

References:

[1] G. Agnarsson, P. Damaschke, M.M. Halldórsson, Powers of geometric intersection graphs and dispersion algorithms, Discrete Applied Mathematics 132 (2003) (Special Issue, in: V. Lozin, D. de Werra (Eds.), Stability in Graphs and Related Topics, pp. 3-16; Preliminary version in: Proceedings of SWAT 2002, Lecture Notes in Computer Science, vol. 2368, Spring, Berlin, 2002, pp. 140-149).; G. Agnarsson, P. Damaschke, M.M. Halldórsson, Powers of geometric intersection graphs and dispersion algorithms, Discrete Applied Mathematics 132 (2003) (Special Issue, in: V. Lozin, D. de Werra (Eds.), Stability in Graphs and Related Topics, pp. 3-16; Preliminary version in: Proceedings of SWAT 2002, Lecture Notes in Computer Science, vol. 2368, Spring, Berlin, 2002, pp. 140-149).
[2] Y.H. Chen, H.I. Lu, C.Y. Tang, Disjoint segments with maximum density, in: International Workshop on Bioinformatics Research and Applications IWBRA 2005 (within ICCS 2005), Lecture Notes in Computer Science, vol. 3515, Springer, Berlin, 2005, pp. 845-850.; Y.H. Chen, H.I. Lu, C.Y. Tang, Disjoint segments with maximum density, in: International Workshop on Bioinformatics Research and Applications IWBRA 2005 (within ICCS 2005), Lecture Notes in Computer Science, vol. 3515, Springer, Berlin, 2005, pp. 845-850. · Zbl 1106.68397
[3] A. Aggarwal, S. Suri, Fast algorithms for computing the largest empty rectangle, in: Symposium on Computing Geometry 1987, 1987, pp. 278-290.; A. Aggarwal, S. Suri, Fast algorithms for computing the largest empty rectangle, in: Symposium on Computing Geometry 1987, 1987, pp. 278-290.
[4] Attalah, M. J.; Fredrickson, G. N., A note on finding a maximum empty rectangle, Discrete Appl. Math., 13, 87-91 (1986) · Zbl 0598.05018
[5] Chazelle, B.; Drysdale, L. R.S.; Lee, D. T., Computing the largest empty rectangle, SIAM J. Comput., 15, 550-555 (1986) · Zbl 0608.68059
[6] Edmonds, J.; Gryz, J.; Liang, D.; Miller, R. J., Mining for empty rectangles in large data sets, Theor. Comput. Sci., 296, 435-452 (2003) · Zbl 1045.68041
[7] B. Liu, L.P. Ku, W. Hsu, Discovering interesting holes in data, in: 15th IJCAI 1997, 1997, pp. 930-935.; B. Liu, L.P. Ku, W. Hsu, Discovering interesting holes in data, in: 15th IJCAI 1997, 1997, pp. 930-935.
[8] Orlowski, M., A new algorithm for the largest empty rectangle problem, Algorithmica, 5, 65-73 (1990) · Zbl 0689.68065
[9] Beger, R. D.; Bolton, P. H., Protein \(\phi\) and \(\psi\) dihedral restraints determined from multidimensional hypersurface correlations of backbone chemical shifts and their use in the determination of protein tertiary structures, J. Biomol. NMR, 10, 129-142 (1997)
[10] Cornilescu, G.; Delaglio, F.; Bax, A., Protein backbone angle restraints from searching a database for chemical shift and sequence homology, J. Biomol. NMR, 13, 289-302 (1999)
[11] Meiler, J., PROSHIFT: protein chemical shift prediction using artificial neural networks, J. Biomol. NMR, 26, 25-37 (2003)
[12] Neal, S.; Nip, A. M.; Zhang, H.; Wishart, D. S., Rapid and accurate calculation of protein \({}^1H, {}^{13}C, {}^{15}N\) chemical shifts, J. Biomol. NMR, 26, 215-240 (2003)
[13] Seavey, B. R.; Farr, E. A.; Westler, W. M.; Markley, J. L., A relational database for sequence-specific protein NMR data, J. Biomol. NMR, 1, 217-236 (1991)
[14] Wang, Y.; Jardetzky, O., Probability-based protein secondary structure identification using combined NMR chemical-shift data, Protein Sci., 11, 852-861 (2002)
[15] Xu, X. P.; Case, D. A., Probing multiple effects on \({}^{15}N, {}^{13}C^\alpha, {}^{13}C^\beta \), and \({}^{13}C^\prime\) chemical shifts in peptides using density functional theory, Biopolymers, 65, 408-423 (2002)
[16] Chiaromonte, F.; Miller, W.; Bouhassira, E. E., Gene length and proximity to neighbors affect genome-wide expression levels, Genome Res., 13, 2602-2608 (2003)
[17] Raghava, G. P.S.; Han, J. H., Correlation and prediction of gene expression level from amino acid and dipeptide composition of its proteins, BMC Bioinform., 6, 59 (2005)
[18] L. Kozobay-Avraham, S. Hosid, A. Bolshoy, Curvature distribution in prokaryotic genomes, in: Silico Biology, vol. 4:0029, 2004.; L. Kozobay-Avraham, S. Hosid, A. Bolshoy, Curvature distribution in prokaryotic genomes, in: Silico Biology, vol. 4:0029, 2004. · Zbl 1165.92304
[19] Rogozin, I. B.; Babenko, V. N.; Milanesi, L.; Pavlov, Y. I., Computational analysis of mutation spectra, Briefings Bioinform., 4, 210-227 (2003)
[20] W.L. Ruzzo, M. Tompa, A linear time algorithm for finding all maximal scoring subsequences, in: Seventh International Conference on Intelligent Systems for Molecular Biology 1999, AAAI, 1999, pp. 234-241.; W.L. Ruzzo, M. Tompa, A linear time algorithm for finding all maximal scoring subsequences, in: Seventh International Conference on Intelligent Systems for Molecular Biology 1999, AAAI, 1999, pp. 234-241.
[21] Cheng, C. H.; Chen, K. Y.; Tien, W. C.; Chao, K. M., Improved algorithms for the \(k\) maximum-sums problem, (16th ISAAC 2005, Lecture Notes in Computer Science, vol. 3827 (2005), Springer: Springer Berlin), 799-808 · Zbl 1175.68541
[22] Lin, T. C.; Lee, D. T., Randomized algorithm for the sum selection problem, (16th ISAAC 2005, Lecture Notes in Computer Science, vol. 3827 (2005), Springer: Springer Berlin), 515-523 · Zbl 1173.68855
[23] B. Jackson et al., An algorithm for optimal partitioning of data on an interval, preprint, arxiv.org/abs/math/0309285, 2004.; B. Jackson et al., An algorithm for optimal partitioning of data on an interval, preprint, arxiv.org/abs/math/0309285, 2004.
[24] Eppstein, D.; Galil, Z.; Giancarlo, R.; Italiano, G. F., Sparse dynamic programming I: Linear cost functions, J. ACM, 39, 519-545 (1992) · Zbl 0807.90120
[25] Galil, Z.; Park, K., Dynamic programming with convexity, concavity and sparsity, Theor. Comput. Sci., 92, 49-76 (1992) · Zbl 0763.90088
[26] Makinen, V.; Navarro, G.; Ukkonen, E., Algorithms for transposition invariant string matching, (20th STACS 2003, Lecture Notes in Computer Science, vol. 2607 (2003), Springer: Springer Berlin), 191-202 · Zbl 1035.68507
[27] D. Dor, Selection algorithms, Ph.D. Thesis, Tel-Aviv University, 1995.; D. Dor, Selection algorithms, Ph.D. Thesis, Tel-Aviv University, 1995.
[28] Paterson, M. S., Progress in selection, (Fifth SWAT’96, Lecture Notes in Computer Science, vol. 1097 (1996), Springer: Springer Berlin), 368-379 · Zbl 1502.68111
[29] I. Baran, E. Demaine, M. Patrascu, Subquadratic algorithms for 3SUM, in: Ninth WADS 2005, Lecture Notes in Computer Science, vol. 3608, Springer, Berlin, 2005, pp. 409-421.; I. Baran, E. Demaine, M. Patrascu, Subquadratic algorithms for 3SUM, in: Ninth WADS 2005, Lecture Notes in Computer Science, vol. 3608, Springer, Berlin, 2005, pp. 409-421. · Zbl 1161.68859
This reference list is based on information provided by the publisher or from digital mathematics libraries. Its items are heuristically matched to zbMATH identifiers and may contain data conversion errors. In some cases that data have been complemented/enhanced by data from zbMATH Open. This attempts to reflect the references listed in the original paper as accurately as possible without claiming completeness or a perfect matching.