Range non-overlapping indexing and successive list indexing.

*(English)*Zbl 1209.68160
Dehne, Frank (ed.) et al., Algorithms and data structures. 10th international workshop, WADS 2007, Halifax, Canada, August 15–17, 2007. Proceedings. Berlin: Springer (ISBN 978-3-540-73948-7/pbk). Lecture Notes in Computer Science 4619, 625-636 (2007).

Summary: We present two natural variants of the indexing problem:

In the range non-overlapping indexing problem, we preprocess a given text to answer queries in which we are given a pattern, and wish to find a maximal-length sequence of occurrences of the pattern in the text, such that the occurrences do not overlap with one another. While efficiently solving this problem, our algorithm even enables us to efficiently perform so in substrings of the text, denoted by given start and end locations. The methods we supply thus generalize the string statistics problem, in which we are asked to report merely the number of non-overlapping occurrences in the entire text, by reporting the occurrences themselves, even only for substrings of the text.

In the related successive list indexing problem, during query-time we are given a pattern and a list of locations in the preprocessed text. We then wish to find a list of occurrences of the pattern, such that the \(i\)th occurrence is the leftmost occurrence of the pattern which starts to the right of the \(i\)th location given by the input list.

Both problems are solved by using tools from computational geometry, specifically a variation of the range searching for minimum problem of H.-P. Lenhof and M. Smid [“Using persistent data structures for adding range restrictions to searching problems”, RAIRO, Inform. Théor. Appl. 28, No. 1, 25–49 (1994; Zbl 0998.68520)], here considered over a grid, in what appears to be the first utilization of range searching for minimum in an indexing-related context.

For the entire collection see [Zbl 1123.68006].

In the range non-overlapping indexing problem, we preprocess a given text to answer queries in which we are given a pattern, and wish to find a maximal-length sequence of occurrences of the pattern in the text, such that the occurrences do not overlap with one another. While efficiently solving this problem, our algorithm even enables us to efficiently perform so in substrings of the text, denoted by given start and end locations. The methods we supply thus generalize the string statistics problem, in which we are asked to report merely the number of non-overlapping occurrences in the entire text, by reporting the occurrences themselves, even only for substrings of the text.

In the related successive list indexing problem, during query-time we are given a pattern and a list of locations in the preprocessed text. We then wish to find a list of occurrences of the pattern, such that the \(i\)th occurrence is the leftmost occurrence of the pattern which starts to the right of the \(i\)th location given by the input list.

Both problems are solved by using tools from computational geometry, specifically a variation of the range searching for minimum problem of H.-P. Lenhof and M. Smid [“Using persistent data structures for adding range restrictions to searching problems”, RAIRO, Inform. Théor. Appl. 28, No. 1, 25–49 (1994; Zbl 0998.68520)], here considered over a grid, in what appears to be the first utilization of range searching for minimum in an indexing-related context.

For the entire collection see [Zbl 1123.68006].

Reviewer: Reviewer (Berlin)