×

Top-\(k\) query retrieval of combinations with sum-of-subsets ranking. (English) Zbl 1420.68081

Zhang, Zhao (ed.) et al., Combinatorial optimization and applications. 8th international conference, COCOA 2014, Wailea, Maui, HI, USA, December 19–21, 2014. Proceedings. Cham: Springer. Lect. Notes Comput. Sci. 8881, 490-505 (2014).
Summary: Top-\(k\) query processing is an important building block for ranked retrieval, with applications ranging from text and data integration to distributed aggregation of network logs and sensor data. Top-\(k\) queries return a ranked set of the \(k\) best data objects selected on the basis of the ranks (scores) of the objects, assigned by any ranking (scoring) function. In this paper, we consider a problem on generation of combinations. Given a set \(S\) of \(n\) real numbers and an integer \(r\le n\), we consider the \(\binom{n}{r}\) different \(r\)-combinations of the elements of \(S\). Let all these \(\binom{n}{r}\) combinations be indicated by the set \(\mathcal{C} =\{C_1, C_2, \ldots, C_{\binom{n}{r}}\}\). From this set \(\mathcal{C}\), given any positive integer \(k \leq \binom{n}{r}\), our goal is to generate the \(k\) best combinations (top-\(k\) combinations) ranked on the basis of some aggregation function \(F\). We consider Summation as the aggregation function. This calculates the sum of all the \(r\) real numbers in any combination \(C_i\), and the ranking criterion is that a combination \(C_i\) is ranked higher than a combination \(C_j\), if the sum of the constituent numbers of \(C_i\) is larger than that of \(C_j\). For any given \(n\) and \(r(\leq n)\), we build a metadata structure \(G\) (basically a DAG), even before \(S\) and \(k\) are known. We can later use \(G\) to report the top-\(k\) combinations efficiently when \(S\) is available. We further present an alternative incremental method, where we generate only the required portions of \(G\) on demand, instead of constructing the whole \(G\) explicitly. It helps us to save the time and space overhead of the preprocessing phase.
For the entire collection see [Zbl 1318.68042].

MSC:

68P05 Data structures
68P20 Information storage and retrieval of data
68R05 Combinatorics in computer science
PDFBibTeX XMLCite
Full Text: DOI