Hooshmand, Sahar; Abedin, Paniz; K??lekci, M. O??uzhan; Thankachan, Sharma V.
Non-overlapping indexing -- cache obliviously
suffix trees; cache oblivious; data structure; string algorithms
Summary: The non-overlapping indexing problem is defined as follows: pre-process a given text \(\mathsf{T}[1,n]\) of length \(n\) into a data structure such that whenever a pattern \(P[1,p]\) comes as an input, we can efficiently report the largest set of non-overlapping occurrences of \(P\) in \(\textsf{T}\). The best known solution is by \textit{H. Cohen} and \textit{E. Porat} [Lect. Notes Comput. Sci. 5878, 1044--1053 (2009; Zbl 1273.68097)]. Their index size is \(O(n)\) words and query time is optimal \(O(p+\mathsf{nocc})\), where \textsf{nocc} is the output size. We study this problem in the cache-oblivious model and present a new data structure of size \(O(n\log n)\) words. It can answer queries in optimal \(O(\frac{p}{B}+\log_B n+\frac{\mathsf{nocc}}{B})\) I/Os, where \(B\) is the block size.
