zbMATH — the first resource for mathematics

On hardness of several string indexing problems. (English) Zbl 1407.68229
Kulikov, Alexander S. (ed.) et al., Combinatorial pattern matching. 25th annual symposium, CPM 2014, Moscow, Russia, June 16–18, 2014. Proceedings. Berlin: Springer. Lect. Notes Comput. Sci. 8486, 242-251 (2014).
Summary: Let \({\mathcal D} =\{d_1,d_2,...,d_D\}\) be a collection of \(D\) string documents of \(n\) characters in total. The two-pattern matching problems ask to index \({\mathcal D}\) for answering the following queries efficiently.
\(\bullet\) report/count the unique documents containing \(P _{1}\) and \(P_{2}\).
\(\bullet\) report/count the unique documents containing \(P _{1}\), but not \(P_{2}\).
Here \(P _{1}\) and \(P _{2}\) represent input patterns of length \(p _{1}\) and \(p _{2}\) respectively. Linear space data structures with \(O(p_1+p_2+\sqrt{nk}\log^{O(1)} n)\) query cost are already known for the reporting version, where \(k\) represents the output size. For the counting version (i.e., report the value \(k)\), a simple linear-space index with \(O(p_1+p_2+ \sqrt{n})\) query cost can be constructed in \(O(n ^{3/2})\) time. However, it is still not known if these are the best possible bounds for these problems. In this paper, we show a strong connection between these string indexing problems and the Boolean matrix multiplication problem. Based on this, we argue that these results cannot be improved significantly using purely combinatorial techniques. We also provide an improved upper bound for a related problem known as two-dimensional substring indexing.
For the entire collection see [Zbl 1290.68010].

68Q25 Analysis of algorithms and problem complexity
68P05 Data structures
68W32 Algorithms on strings
Full Text: DOI