zbMATH — the first resource for mathematics

Top-$$k$$ document retrieval in compact space and near-optimal time. (English) Zbl 1406.68022
Cai, Leizhen (ed.) et al., Algorithms and computation. 24th international symposium, ISAAC 2013, Hong Kong, China, December 16–18, 2013. Proceedings. Berlin: Springer (ISBN 978-3-642-45029-7/pbk). Lecture Notes in Computer Science 8283, 394-404 (2013).
Summary: Let $$\mathcal{D}= \{d _{1},d _{2},\dots d _{D }\}$$ be a given set of $$D$$ string documents of total length $$n$$. Our task is to index $$\mathcal{D}$$ such that the $$k$$ most relevant documents for an online query pattern $$P$$ of length $$p$$ can be retrieved efficiently. There exist linear space data structures of $$O(n)$$ words for answering such queries in optimal $$O(p + k)$$ time. In this paper, we describe a compact index of size $$|\mathrm{CSA}|+n \lg D + o(n\lg D)$$ bits with near optimal time, $$O(p + k \lg^{*} n)$$, for the basic relevance metric term-frequency, where $$|\mathrm{CSA}|$$ is the size (in bits) of a compressed full-text index of $$\mathcal{D}$$, and $$\lg ^{*} n$$ is the iterated logarithm of $$n$$.
For the entire collection see [Zbl 1277.68005].

MSC:
 68P20 Information storage and retrieval of data 68P05 Data structures 68W32 Algorithms on strings
Full Text: