zbMATH — the first resource for mathematics

Top-\(k\) document retrieval in compact space and near-optimal time. (English) Zbl 1406.68022
Cai, Leizhen (ed.) et al., Algorithms and computation. 24th international symposium, ISAAC 2013, Hong Kong, China, December 16–18, 2013. Proceedings. Berlin: Springer (ISBN 978-3-642-45029-7/pbk). Lecture Notes in Computer Science 8283, 394-404 (2013).
Summary: Let \(\mathcal{D}= \{d _{1},d _{2},\dots d _{D }\}\) be a given set of \(D\) string documents of total length \(n\). Our task is to index \(\mathcal{D}\) such that the \(k\) most relevant documents for an online query pattern \(P\) of length \(p\) can be retrieved efficiently. There exist linear space data structures of \(O(n)\) words for answering such queries in optimal \(O(p + k)\) time. In this paper, we describe a compact index of size \(|\mathrm{CSA}|+n \lg D + o(n\lg D)\) bits with near optimal time, \(O(p + k \lg^{*} n)\), for the basic relevance metric term-frequency, where \(|\mathrm{CSA}|\) is the size (in bits) of a compressed full-text index of \(\mathcal{D}\), and \(\lg ^{*} n\) is the iterated logarithm of \(n\).
For the entire collection see [Zbl 1277.68005].

68P20 Information storage and retrieval of data
68P05 Data structures
68W32 Algorithms on strings
Full Text: DOI