Generation of random clusters with specified degree of separation. (English) Zbl 1336.62189

Summary: We propose a random cluster generation algorithm that has the desired features: (1) the population degree of separation between clusters and the nearest neighboring clusters can be set to a specified value, based on a separation index; (2) no constraint is imposed on the isolation among clusters in each dimension; (3) the covariance matrices correspond to different shapes, diameters and orientations; (4) the full cluster structures generally could not be detected simply from pair-wise scatterplots of variables; (5) noisy variables and outliers can be imposed to make the cluster structures harder to be recovered. This algorithm is an improvement on the method used in [G. W. Milligan, “An algorithm for generating artificial test clusters”, Psychometrika 50, No. 1, 123–127 (1985; doi:10.1007/BF02294153)].


62H30 Classification and discrimination; cluster analysis (statistical aspects)
62K15 Factorial statistical designs
