zbMATH — the first resource for mathematics

Ranked document selection. (English) Zbl 1416.68064
Ravi, R. (ed.) et al., Algorithm theory – SWAT 2014. 14th Scandinavian symposium and workshops, Copenhagen, Denmark, July 2–4, 2014. Proceedings. Berlin: Springer. Lect. Notes Comput. Sci. 8503, 344-356 (2014).
Summary: Let \({\mathcal D}\) be a collection of string documents of \(n\) characters in total. The top-\(k\) document retrieval problem is to preprocess \({\mathcal D}\) into a data structure that, given a query \((P,k)\), can return the \(k\) documents of \({\mathcal D}\) most relevant to pattern \(P\). The relevance of a document \(d\) for a pattern \(P\) is given by a predefined ranking function \(w(P,d)\). Linear space and optimal query time solutions already exist for this problem.
In this paper we consider a novel problem, document selection queries, which aim to report the \(k\)th document most relevant to \(P\) (instead of reporting all top-\(k\) documents). We present a data structure using \(O(n \log ^{\varepsilon } n)\) space, for any constant \(\varepsilon > 0\), answering selection queries in time \(O(\log k / \log \log n)\), and a linear-space data structure answering queries in time \(O(\log k)\), given the locus node of \(P\) in a (generalized) suffix tree of \({\mathcal D}\). We also prove that it is unlikely that a succinct-space solution for this problem exists with polylogarithmic query time.
For the entire collection see [Zbl 1293.68031].

68P20 Information storage and retrieval of data
68P05 Data structures
Full Text: DOI