zbMATH — the first resource for mathematics

Succinct non-overlapping indexing. (English) Zbl 1432.68089
Cicalese, Ferdinando (ed.) et al., Combinatorial pattern matching. 26th annual symposium, CPM 2015, Ischia Island, Italy, June 29 – July 1, 2015. Proceedings. Cham: Springer. Lect. Notes Comput. Sci. 9133, 185-195 (2015).
Summary: Given a text \(\mathsf {T}\) having \(n\) characters, we consider the non-overlapping indexing problem defined as follows: pre-process \(\mathsf {T}\) into a data-structure, such that whenever a pattern \(P\) comes as input, we can report a maximal set of non-overlapping occurrences of \(P\) in \(\mathsf {T}\). The best known solution for this problem takes linear space, in which a suffix tree of \(\mathsf {T}\) is augmented with \(O(n)\)-word data structures. A query \(P\) can be answered in optimal \(O(|P|+\mathsf {nocc})\) time, where \(\mathsf {nocc}\) is the output size [H. Cohen and E. Porat, Lect. Notes Comput. Sci. 5878, 1044–1053 (2009; Zbl 1273.68097)]. We present the following new result: let \(\mathsf {CSA}\) (not necessarily a compressed suffix array) be an index of \(\mathsf {T}\) that can compute (i) the suffix range of \(P\) in \(\mathsf {search}(P)\) time, and (ii) a suffix array or an inverse suffix array value in \(\mathsf t_\mathsf{SA}\) time; then by using \(\mathsf {CSA}\) alone, we can answer a query \(P\) in \(O(\mathsf {search}(P)+ \mathsf {nocc}\cdot \mathsf t_\mathsf{SA})\) time. Additionally, we present an improved result for a generalized version of this problem called range non-overlapping indexing.
For the entire collection see [Zbl 1314.68012].

68P05 Data structures
68P15 Database theory
68W32 Algorithms on strings
Full Text: DOI