×

External memory algorithms. (English) Zbl 1010.68040

Abello, James (ed.) et al., Handbook of massive data sets. Dordrecht: Kluwer Academic Publishers. Massive Comput. 4, 359-416 (2002).
Summary: Data sets in large applications are often too massive to fit completely inside the computer’s internal memory. The resulting input/output communication (or I/O) between fast internal memory and slower external memory (such as disks) can be a major performance bottleneck. In this paper we survey the state of the art in the design and analysis of external memory (or \(EM)\) algorithms, where the goal is to exploit locality in order to reduce the I/O costs.
For sorting and related problems like permuting and fast Fourier transform, the key paradigms include distribution and merging. The paradigm of disk striping offers an elegant way to use multiple disks in parallel. For sorting, however, disk striping can be nonoptimal with respect to I/O, so to gain further improvements we discuss distribution and merging techniques for using the disks independently. We consider EM paradigms for computations involving matrixes, geometric data, and graphs, and we look at problems caused by dynamic memory allocation. We report on some experiments in the domain of spatial databases using the TPIE system (Transparent Parallel I/O programming Environment). The newly developed EM algorithms and data structures that incorporate the paradigms we discuss in this chapter are significantly faster than methods currently used in practice.
For the entire collection see [Zbl 0986.00022].

MSC:

68P05 Data structures
68M99 Computer system organization

Software:

TPIE
PDFBibTeX XMLCite