Dynamic clustering of histogram data: using the right metric. (English) Zbl 1151.62335

Brito, Paula (ed.) et al., Selected contributions in data analysis and classification. In honour of Edwin Diday. With a foreword by Yves Escoufier. Berlin: Springer (ISBN 978-3-540-73558-8/pbk). Studies in Classification, Data Analysis, and Knowledge Organization, 123-134 (2007).
Summary: We present a review of some metrics to be proposed as allocation functions in the Dynamic Clustering Algorithm (DCA) when data are distributions or histograms of values. The choice of the most suitable distance plays a central role in the DCA because it is related to the criterion function that is optimized. Moreover, it has to be consistent with the prototype which represents the cluster. In such a way, for each proposed metric, we identify the corresponding prototype according to the minimization of the criterion function and then to the best fitting between the partition and the best representation of the clusters. Finally, we focus our attention on a Wassertein based distance showing its optimality in partitioning a set of histogram data with respect to a representation of the clusters by means of their barycenter expressed in terms of distributions.
For the entire collection see [Zbl 1146.68003].


62H30 Classification and discrimination; cluster analysis (statistical aspects)
Full Text: DOI arXiv