Random sampling with a reservoir.

*(English)*Zbl 0562.68028Fast algorithms are introduced for selecting a random sample of n records without replacement from a pool of N records, where the value of N is unknown beforehand. The main result of the paper is the design and analysis of algorithm Z; it does the sampling using constant space and in \(O(n(1+\log (N/n)))\) expected time, which is optimum, up to a constant factor. Several optimizations are studied that collectively improve the speed of the naive version of the algorithm by an order of magnitude. An efficient Pascal-like implementation is given that incorporates these modifications and that is suitable for general use. Theoretical and empirical results indicate the algorithm Z outperforms current methods by a significant margin.

##### MSC:

68Q25 | Analysis of algorithms and problem complexity |

65C10 | Random number generation in numerical analysis |

65C99 | Probabilistic methods, stochastic differential equations |

62-04 | Software, source code, etc. for problems pertaining to statistics |