×

Managing unbounded-length keys in comparison-driven data structures with applications to online indexing. (English) Zbl 1305.68066

Summary: This paper presents a general technique for optimally transforming any dynamic data structure that operates on atomic and indivisible keys by constant-time comparisons, into a data structure that handles unbounded-length keys whose comparison cost is not a constant. Examples of these keys are strings, multidimensional points, multiple-precision numbers, multikey data (e.g., records), XML paths, URL addresses, etc. The technique is more general than what has been done in previous work as no particular exploitation of the underlying structure is required. The only requirement is that the insertion of a key must identify its predecessor or its successor. Using the proposed technique, online suffix tree construction can be done in worst case time \(O(\log n)\) per input symbol (as opposed to amortized \(O(\log n)\) time per symbol, achieved by previously known algorithms). To our knowledge, our algorithm is the first that achieves \(O(\log n)\) worst case time per input symbol. Searching for a pattern of length \(m\) in the resulting suffix tree takes \(O(\min(m \log |\Sigma|, m + \log n) + \mathrm{tocc})\) time, where tocc is the number of occurrences of the pattern. The paper also describes more applications and shows how to obtain alternative methods for dealing with suffix sorting, dynamic lowest common ancestors, and order maintenance. The technical features of the proposed technique for a given data structure \(\mathscr{D}\) are the following ones. The new data structure \(\mathscr{D}'\) is obtained from \(\mathscr{D}\) by augmenting the latter with an oracle for strings, extending the functionalities of the Dietz-Sleator list for order maintenance [P. F. Dietz and D. D. Sleator, “Two algorithms for maintaining order in a list”, in: Proceedings of the nineteenth annual ACM symposium on theory of computing, STOC 1987. New York: ACM. 365–372 (1987); A. K. Tsakalidis, Acta Inf. 21, 101–112 (1984; Zbl 0515.68038)]. The space complexity of \(\mathscr{D}'\) is \(\mathscr{S}(n) + O(n)\) memory cells for storing \(n\) keys, where \(\mathscr{S}(n)\) denotes the space complexity of \(\mathscr{D}\). Then, each operation involving \(O(1)\) keys taken from \(\mathscr{D}'\) requires \(O(\mathscr{T}(n))\) time, where \(\mathscr{T}(n)\) denotes the time complexity of the corresponding operation originally supported in \(\mathscr{D}\). Each operation involving a key \(y\) not stored in \(\mathscr{D}'\) takes \(O(\mathscr{T}(n) + |y|)\) time, where \(|y|\) denotes the length of \(y\). For the special case where the oracle handles suffixes of a string, the achieved insertion time is \(O(\mathscr{T}(n))\).

MSC:

68P05 Data structures
68P10 Searching and sorting

Citations:

Zbl 0515.68038
PDFBibTeX XMLCite
Full Text: DOI arXiv