zbMATH — the first resource for mathematics

Efficient computation of frequent and top-\(k\) elements in data streams. (English) Zbl 1112.68371
Eiter, Thomas (ed.) et al., Database theory – ICDT 2005. 10th international conference, Edinburgh, UK, January 5–7, 2005. Proceedings. Berlin: Springer (ISBN 3-540-24288-0/pbk). Lecture Notes in Computer Science 3363, 398-412 (2005).
Summary: We propose an integrated approach for solving both problems of finding the most popular \(k\) elements, and finding frequent elements in a data stream. Our technique is efficient and exact if the alphabet under consideration is small. In the more practical large alphabet case, our solution is space efficient and reports both top-\(k\) and frequent elements with tight guarantees on errors. For general data distributions, our top-\(k\) algorithm can return a set of \(k'\) elements, where \(k' \approx k\), which are guaranteed to be the top-\(k'\) elements; and we use minimal space for calculating frequent elements. For realistic Zipfian data, our space requirement for the frequent elements problem decreases dramatically with the parameter of the distribution; and for top-\(k\) queries, we ensure that only the top-\(k\) elements, in the correct order, are reported. Our experiments show significant space reductions with no loss in accuracy.
For the entire collection see [Zbl 1067.68004].

68P15 Database theory
68P20 Information storage and retrieval of data
Full Text: DOI