an:06856556
Zbl 1391.68129
Hon, Wing-Kai; Lam, Tak-Wah; Shah, Rahul; Thankachan, Sharma V.; Ting, Hing-Fung; Yang, Yilin
Dictionary matching with a bounded gap in pattern or in text
EN
Algorithmica 80, No. 2, 698-713 (2018).
00376620
2018
j
68W32 68P05
dictionary matching; bounded gap; stabbing query; indexing
Summary: A gap is a sequence of don't care characters. In this paper, we study two variants of the dictionary matching problem, where gaps may be present in the patterns or in the text. The first variant, called \textit{dictionary matching with one gap}, considers indexing a collection \({\mathcal D}\) of \(d\) one-gap-patterns, where the \(i\)th pattern is of the form \(P_i[\alpha_i,\beta_i]Q_i\) with \(P_i\) and \(Q_i\) are strings drawn from an alphabet \(\varSigma\) and \([\alpha_i,\beta_i]\) denote the lower and upper bounds on the gap length. The target is to allow a user to efficiently identify all substrings of a query text \(T\) that match with any one-gap-pattern in the collection. We present a linear space solution for answering the above dictionary matching query in time \(O(|T|\gamma\log\lambda\log d+\mathsf{occ})\), where \(\gamma\) denotes the number of distinct gap lengths, \(\lambda\) denotes the number of distinct lower and upper bounds of gap lengths, and the \(\mathsf{occ}\) is the output size. The query time can be improved to \(O(|T|\gamma+\mathsf{occ})\) using \(O(d^{1+\epsilon })\) extra space, where \(\epsilon>0\) is an arbitrarily small constant. Additionally, we show a succinct-space index offering a space-time tradeoff. In the special case where parameters \(\alpha_i\) and \(\beta_i\)'s for all the patterns are same, our results improve upon the work by \textit{A. Amir} et al. [Theor. Comput. Sci. 589, 34--46 (2015; Zbl 1319.68106)]. The second variant, called \textit{dictionary matching with one missing substring}, is a new problem in which a gap of bounded length may be present in the text substring when it is being matched. We show that this problem can be solved by using a similar framework. Furthermore, by applying a centroid path decomposition on the \textit{failure tree}, we obtain a space-time tradeoff result, which will be suitable when the dictionary contains only \textit{short} patterns, or when index space is a critical concern.
Zbl 1319.68106