Partition of interval-valued observations using regression. (English) Zbl 07512353

Summary: Both regression modeling and clustering methodologies have been extensively studied as separate techniques. There has been some activity in using regression-based algorithms to partition a data set into clusters for classical data; we propose one such algorithm to cluster interval-valued data. The new algorithm is based on the \(k\)-means algorithm of J. MacQueen [in: Proceedings of the 5th Berkeley symposium on mathematical statistics and probability. Vol. 1. Berkeley, CA: University of California Press. 281–297 (1967; Zbl 0214.46201)] and the dynamical partitioning method of E. Diday and J. C. Simon [in: Digital pattern recognition. Berlin, Heidelberg, New York: Springer. 47–94 (1976; Zbl 0331.62043)], with the partitioning criteria being based on establishing regression models for each sub-cluster. This also depends on distance measures between the underlying regression models for each sub-cluster. Several types of simulated data sets are generated for several different data structures. The proposed \(k\)-regressions algorithm consistently out-performs the \(k\)-means algorithm. Elbow plots are used to identify the total number of clusters \(K\) in the partition. The new method is also applied to real data.


62H30 Classification and discrimination; cluster analysis (statistical aspects)


Algorithm 39
Full Text: DOI


