zbMATH — the first resource for mathematics

Random sampling with a reservoir. (English) Zbl 0562.68028
Fast algorithms are introduced for selecting a random sample of n records without replacement from a pool of N records, where the value of N is unknown beforehand. The main result of the paper is the design and analysis of algorithm Z; it does the sampling using constant space and in \(O(n(1+\log (N/n)))\) expected time, which is optimum, up to a constant factor. Several optimizations are studied that collectively improve the speed of the naive version of the algorithm by an order of magnitude. An efficient Pascal-like implementation is given that incorporates these modifications and that is suitable for general use. Theoretical and empirical results indicate the algorithm Z outperforms current methods by a significant margin.

68Q25 Analysis of algorithms and problem complexity
65C10 Random number generation in numerical analysis
65C99 Probabilistic methods, stochastic differential equations
62-04 Software, source code, etc. for problems pertaining to statistics
PDF BibTeX Cite
Full Text: DOI Link