Distributed streams algorithms for sliding windows.

*(English)*Zbl 1093.68143Summary: Massive data sets often arise as physically distributed, parallel data streams, and it is important to estimate various aggregates and statistics on the union of these streams. This paper presents algorithms for estimating aggregate functions over a “sliding window” of the \(N\) most recent data items in one or more streams. Our results include:

1. For a single stream, we present the first \(\epsilon\)-approximation scheme for the number of 1’s in a sliding window that is optimal in both worst case time and space. We also present the first \(\epsilon\)-approximation scheme for the sum of integers in \([0\dots R]\) in a sliding window that is optimal in both worst case time and space (assuming \(R\) is at most polynomial in \(N\)). Both algorithms are deterministic and use only logarithmic memory words.

2. In contrast, we show that any deterministic algorithm that estimates, to within a small constant relative error, the number of 1’s (or the sum of integers) in a sliding window on the union of distributed streams requires \(\Omega(N)\) space.

3. We present the first (randomized) \((\epsilon, \delta)\)-approximation scheme for the number of 1’s in a sliding window on the union of distributed streams that uses only logarithmic memory words. We also present the first \((\epsilon, \delta)\)-approximation scheme for the number of distinct values in a sliding window on distributed streams that uses only logarithmic memory words.

Our results are obtained using a novel family of synopsis data structures called waves.

1. For a single stream, we present the first \(\epsilon\)-approximation scheme for the number of 1’s in a sliding window that is optimal in both worst case time and space. We also present the first \(\epsilon\)-approximation scheme for the sum of integers in \([0\dots R]\) in a sliding window that is optimal in both worst case time and space (assuming \(R\) is at most polynomial in \(N\)). Both algorithms are deterministic and use only logarithmic memory words.

2. In contrast, we show that any deterministic algorithm that estimates, to within a small constant relative error, the number of 1’s (or the sum of integers) in a sliding window on the union of distributed streams requires \(\Omega(N)\) space.

3. We present the first (randomized) \((\epsilon, \delta)\)-approximation scheme for the number of 1’s in a sliding window on the union of distributed streams that uses only logarithmic memory words. We also present the first \((\epsilon, \delta)\)-approximation scheme for the number of distinct values in a sliding window on distributed streams that uses only logarithmic memory words.

Our results are obtained using a novel family of synopsis data structures called waves.

##### MSC:

68W15 | Distributed algorithms |