# zbMATH — the first resource for mathematics

Position-restricted substring searching. (English) Zbl 1145.68392
Correa, José R. (ed.) et al., LATIN 2006: Theoretical informatics. 7th Latin American symposium, Valdivia, Chile, March 20–24, 2006. Proceedings. Berlin: Springer (ISBN 3-540-32755-X/pbk). Lecture Notes in Computer Science 3887, 703-714 (2006).
Summary: A full-text index is a data structure built over a text string $$T[1,n]$$. The most basic functionality provided is $$(a)$$ counting how many times a pattern string $$P[1,m]$$ appears in $$T$$ and $$(b)$$ locating all those $$occ$$ positions. There exist several indexes that solve $$(a)$$ in $$O(m)$$ time and $$(b)$$ in $$O(occ)$$ time. In this paper we propose two new queries, $$(c)$$ counting how many times $$P[1,m]$$ appears in $$T[l,r]$$ and $$(d)$$ locating all those $$occ_{l,r}$$ positions. These can be solved using $$(a)$$ and $$(b)$$ but this requires $$O(occ)$$ time. We present two solutions to $$(c)$$ and $$(d)$$ in this paper. The first is an index that requires $$O(n \log n)$$ bits of space and answers $$(c)$$ in $$O(m+\log n)$$ time and $$(d)$$ in $$O(\log n)$$ time per occurrence (that is, $$O(occ_{l,r} \log n)$$ time overall). A variant of the first solution answers $$(c)$$ in $$O(m+\log \log n)$$ time and $$(d)$$ in constant time per occurrence, but requires $$O(n\log ^{1+\epsilon } n)$$ bits of space for any constant $$\epsilon > 0$$. The second solution requires $$O(n m \log \sigma )$$ bits of space, solving $$(c)$$ in $$O(m \lceil \log \sigma / \log \log n \rceil )$$ time and $$(d)$$ in $$O(m \lceil \log \sigma / \log \log n \rceil )$$ time per occurrence, where $$\sigma$$ is the alphabet size. This second structure takes less space when the text is compressible.
Our solutions can be seen as a generalization of rank and select dictionaries, which allow computing how many times a given character $$c$$ appears in a prefix $$T[1,i]$$ and also locate the $$i$$-th occurrence of $$c$$ in $$T$$. Our solution to $$(c)$$ extends character rank queries to substring rank queries, and our solution to $$(d)$$ extends character select to substring select queries.
As a byproduct, we show how rank queries can be used to implement fractional cascading in little space, so as to obtain an alternative implementation of a well-known two-dimensional range search data structure by Chazelle. We also show how Grossi et al.’s wavelet trees are suitable for two-dimensional range searching, and their connection with Chazelle’s data structure.
For the entire collection see [Zbl 1096.68003].

##### MSC:
 68P05 Data structures 68P10 Searching and sorting 68W05 Nonnumerical algorithms
Full Text: