Algorithms for clustering data.

*(English)*Zbl 0665.62061
Prentice Hall Advanced Reference Series. Englewood Cliffs, NJ: Prentice Hall. XIV, 320 p. $ 67.60 (1988).

The book presents the backgrounds, the methods, and the algorithms of cluster analysis. It can serve as a textbook for a graduate course in exploratory data analysis as well as a supplement text in courses on research methodology, pattern recognition, image processing, and remote sensing. Graphical procedures and other tools for visually representing data are introduced both to evaluate the results of clustering and to explore data. A number of numerical examples is given.

The book contains five chapters (1. Introduction. 2. Data representation. 3. Clustering methods and algorithms. 4. Cluster validity, 5. Applications.) and several brief theoretical additions: A. Pattern recognition. B. Distributions (the Gaussian and the hypergeometric). C. Linear algebra. D. Scatter matrices. E. Factor analysis. F. Multivariate analysis of variance. G. Graph theory. H. Algorithms for generating clustered data.

In Chapter 2, data types and scales, proximity indices, data normalization, linear and nonlinear projections, intrinsic (topological) dimensionality of patterns, and multidimensional scaling are considered. Clustering methods and algorithms of Chapter 3 include hierarchical clustering (dendrograms), partitional clustering (square-error, nearest- neighbor and other clustering methods), and some discussion on clustering methodology. Chapter 4 is devoted to the very important and most difficult problem of cluster validity. Testing hypotheses, power of a test, various indices of cluster validity, and other general validation problems are discussed. Some actual tests (Huber’s statistic, Goodman- Kruskal statistic, and tests based on nearest-neighbor distances) are presented. Validity of both hierarchical and partitional structures, as well as validity of individual clusters, are analysed and illustrated by numerical examples. The clustering tendency problem is also considered. Applications of cluster analysis to image processing and registration, as well as to segmentation of various images (textured, range, and multispectral ones) is presented in Chapter 5. The bibliography contains 427 items.

The book contains five chapters (1. Introduction. 2. Data representation. 3. Clustering methods and algorithms. 4. Cluster validity, 5. Applications.) and several brief theoretical additions: A. Pattern recognition. B. Distributions (the Gaussian and the hypergeometric). C. Linear algebra. D. Scatter matrices. E. Factor analysis. F. Multivariate analysis of variance. G. Graph theory. H. Algorithms for generating clustered data.

In Chapter 2, data types and scales, proximity indices, data normalization, linear and nonlinear projections, intrinsic (topological) dimensionality of patterns, and multidimensional scaling are considered. Clustering methods and algorithms of Chapter 3 include hierarchical clustering (dendrograms), partitional clustering (square-error, nearest- neighbor and other clustering methods), and some discussion on clustering methodology. Chapter 4 is devoted to the very important and most difficult problem of cluster validity. Testing hypotheses, power of a test, various indices of cluster validity, and other general validation problems are discussed. Some actual tests (Huber’s statistic, Goodman- Kruskal statistic, and tests based on nearest-neighbor distances) are presented. Validity of both hierarchical and partitional structures, as well as validity of individual clusters, are analysed and illustrated by numerical examples. The clustering tendency problem is also considered. Applications of cluster analysis to image processing and registration, as well as to segmentation of various images (textured, range, and multispectral ones) is presented in Chapter 5. The bibliography contains 427 items.

Reviewer: V.Yu.Urbakh

##### MSC:

62H30 | Classification and discrimination; cluster analysis (statistical aspects) |

62-02 | Research exposition (monographs, survey articles) pertaining to statistics |

65C99 | Probabilistic methods, stochastic differential equations |