an:06180273
Zbl 1267.68102
Hon, Wing-Kai; Shah, Rahul; Thankachan, Sharma V.; Vitter, Jeffrey Scott
On position restricted substring searching in succinct space
EN
J. Discrete Algorithms 17, 109-114 (2012).
00315022
2012
j
68P05 68P10
succint data structures; pattern matching; range searching
Summary: We study the position restricted substring searching (PRSS) problem, where the task is to index a text \(T[0\dots n-1]\) of \(n\) characters over an alphabet set \(\Sigma\) of size \(\delta\), in order to answer the following: given a query pattern \(P\) (of length \(p\)) and two indices \(\ell\) and \(r\), report all \(occ_{\ell,r}\) occurrences of \(P\) in \(T[\ell \dots r]\). Known indexes take \(O(n\log n)\) bits or \(O(n\log^{1+\epsilon}n)\) bits space, and answer this query in \(O(p+\log n+occ_{\ell,r}\log n)\) time or in optimal \(O(p+occ_{\ell,r})\) time respectively, where \(\epsilon\) is any positive constant. The main drawback of these indexes is their space requirement of \(\Omega (n\log n)\) bits, which can be much more than the optimal \(\log \delta\) bits to store the text \(T\).
This paper addresses an open question asked by \textit{V. M??kinen} and \textit{G. Navarro} [Lect. Notes Comput. Sci. 3887, 703--714 (2006; Zbl 1145.68392)], which is whether it is possible to design a succinct index answering PRSS queries efficiently. We first study the hardness of this problem and prove the following result: a succinct (or a compact) index cannot answer PRSS queries efficiently in the pointer machine model, and also not in the RAM model unless bounds on the well-researched orthogonal range query problem improve. However, for the special case of sufficiently long query patterns, that is for \(\Omega (\log^{2+\epsilon} n)\), we derive an \(|CSA_f|+|CSA_r|+o(n)\) bits index with optimal query time, where \(|CSA_f|\) and \(|CSA_r|\) are the space (in bits) of the compressed suffix arrays (with \(O(p)\) time for pattern search) of \(T\) and \(\overleftarrow T\) (the reverse of \(T\)) respectively.
The space can be reduced further to \(|CSA_f|+o(n)|\) bits with a resulting query time will be \(O(p+occ_{\ell,r}+\log^{3+\epsilon}n)\). For the general case, where there is no restriction on pattern length, we obtain an \(O(\frac {1}{\epsilon 3}n\log \delta)\) bits index with \(O(p+occ_{\ell,r}+n^\epsilon)\) query time. We use suffix sampling techniques to achieve these space-efficient indexes.
Zbl 1145.68392