×

Dynamically computing approximate frequency counts in sliding window over data stream. (English) Zbl 1097.68150

Summary: This paper presents two one-pass algorithms for dynamically computing frequency counts in sliding window over a data stream-computing frequency counts exceeding user-specified threshold \(\varepsilon\). The first algorithm constructs subwindows and deletes expired subwindows periodically in sliding window, and each sub-window maintains a summary data structure. The first algorithm outputs at most \(1/\varepsilon+1\) elements for frequency queries over the most recent \(N\) elements. The second algorithm adapts multiple levels method to deal with data stream. Once the sketch of the most recent \(N\) elements has been constructed, the second algorithm can provides the answers to the frequency queries over the most recent \(n\) \((n\leq N)\) elements. The second algorithm outputs at most \(1/\varepsilon+2\) elements. The analytical and experimental results show that our algorithms are accurate and effective.

MSC:

68W25 Approximation algorithms
PDFBibTeX XMLCite
Full Text: DOI

References:

[1] Misra J, Gries D. Finding Repeated Elements.Sci Comput Programming, 1982,2(2):143–152. · Zbl 0497.68041 · doi:10.1016/0167-6423(82)90012-0
[2] Demaine E D, Lopez-Ortiz A, Munro J I. Frequency Estimation of Internet Packet Streams with Limited Space.Proc of the 10th Annual European Symp on Algorithms. Rome, Springer-Verlag, 2002. 348–360. · Zbl 1019.68502
[3] Karp R M, Shenker S, Papadimitriou C H. A Simple Algorithm for Finding Frequent Elements in Streams and Bags.ACM Trans on Database Systems, 2003,28(1):51–55. · Zbl 05457044 · doi:10.1145/762471.762473
[4] Manku G S, Motwani R. Approximate Frequency Counts over Data Streams.Proc of the 28th Intl Conf on very Large Data Bases. Hong Kong, Morgan Kaufmann, 2002, 346–357.
[5] Datar M, Gionis A, Indyk P,et al. Maintaining Stream Statistics over Sliding Windows.Proc 13th SIAM-ACM Symp on Discrete Algorithms. San Francisco ACM/SIAM, 2002. 635–644. · Zbl 1093.68673
[6] Arasu A, Manku G S. Approximate Counts and Quantiles over Sliding Window.Proc of ACM Symp on Principles of Database Systems. Paris, ACM, 2004, 286–296.
[7] Fischer M, Salzberg S. Finding a Majority amongn Votes: Solution to Problem 81-5.Journal of Algorithms, 1982,3 (4):376–379.
[8] Agrawal R, Imielinski T, Swami A N. Mining Association Rules Between Sets of Items in Large Databases.Proc of the 1993 ACM SIGMOD Intl Conf on Management of Data. Washington, ACM, 1993, 207–216.
[9] Golab L, Dehaan D, Demaine E,et al. Identifying Frequent Items in Sliding Windows over On-Line Packet Streams.SIGCOMM Internet Measurement Conference. Miami, ACM, 2003, 173–178.
[10] Zhu Y, Shasha D. StatStream: Statistical Monitoring of Thousands of Data Streams in Real Time.Proc 28th Int Conf on Very Large Data Bases. Hong Kong, Morgan Kaufmann, 2002, 358–369.
[11] Qiao L, Agrawal D, Abbadi A E. Supporting Sliding Window queries for Continuous Data Streams.Proc 15th Int Conf on Scientific and Statistical Database Management. Massachusetts: IEEE Computer Society, 2003, 85–94.
[12] Gibbons P B, Tirthapura S. Distributed Streams Algorithms for Sliding Windows.Proc 14th ACM Symposium on Parallel Algorithms and Architectures. Winnipeg, ACM Press, 2002, 63–72. · Zbl 1093.68143
This reference list is based on information provided by the publisher or from digital mathematics libraries. Its items are heuristically matched to zbMATH identifiers and may contain data conversion errors. In some cases that data have been complemented/enhanced by data from zbMATH Open. This attempts to reflect the references listed in the original paper as accurately as possible without claiming completeness or a perfect matching.