×

Gaussian-based visualization of Gaussian and non-Gaussian-based clustering. (English) Zbl 07370656

Summary: A generic method is introduced to visualize in a “Gaussian-like way,” and onto \(\mathbb{R}^2\), results of Gaussian or non-Gaussian-based clustering. The key point is to explicitly force a visualization based on a spherical Gaussian mixture to inherit from the within cluster overlap that is present in the initial clustering mixture. The result is a particularly user-friendly drawing of the clusters, providing any practitioner with an overview of the potentially complex clustering result. An entropic measure provides information about the quality of the drawn overlap compared with the true one in the initial space. The proposed method is illustrated on four real data sets of different types (categorical, mixed, functional, and network) and is implemented on the r package ClusVis.

MSC:

62H30 Classification and discrimination; cluster analysis (statistical aspects)
PDFBibTeX XMLCite
Full Text: DOI HAL

References:

[1] Ambroise, C.; Matias, C., New consistent and asymptotically normal parameter estimates for random-graph mixture models, J. R. Stat. Soc. Ser. B. Stat. Methodol., 74, 1, 3-35 (2012) · Zbl 1411.62051 · doi:10.1111/j.1467-9868.2011.01009.x
[2] Audigier, V., Husson, F., & Josse, J. (2016a). Multiple imputation for continuous variables using a Bayesian principal component analysis. Journal of Statistical Computation and Simulation, 86(11), 2140-2156. · Zbl 1510.62262
[3] Audigier, V., Husson, F., & Josse, J. (2016b). A principal component method to impute missing values for mixed data. Advances in Data Analysis and Classification, 10(1), 5-26. · Zbl 1414.62206
[4] Banfield, J.; Raftery, A., Model-based Gaussian and non-Gaussian clustering, Biometrics, 49, 3, 803-821 (1993) · Zbl 0794.62034 · doi:10.2307/2532201
[5] Benaglia, T.; Chauveau, D.; Hunter, DR, An em-like algorithm for semi- and nonparametric estimation in multivariate mixtures, Journal of Computational and Graphical Statistics, 18, 505-526 (2009) · doi:10.1198/jcgs.2009.07175
[6] Bezdek, JC; Pal, MR; Keller, J.; Krisnapuram, R., Fuzzy Models and Algorithms for Pattern Recognition and Image Processing (1999), USA: Kluwer Academic Publishers, USA · Zbl 0998.68138 · doi:10.1007/b106267
[7] Biernacki, C. (2017). Mixture models. In J.-J. Droesbeke, G. Saporta Thomas-Agnan, eds, ‘Choix de modèles et agrégation’, Technip. https://hal.inria.fr/hal-01252671.
[8] Biernacki, C.; Celeux, G.; Govaert, G., Assessing a mixture model for clustering with the integrated completed likelihood, Pattern Analysis and Machine Intelligence, IEEE Transactions on, 22, 7, 719-725 (2000) · doi:10.1109/34.865189
[9] Bishop, CM; Svensén, M.; Williams, CK, Gtm: The generative topographic mapping, Neural computation, 10, 1, 215-234 (1998) · doi:10.1162/089976698300017953
[10] Bouveyron, C. (2015). funFEM: Clustering in the Discriminative Functional Subspace. R package version 1.1. https://CRAN.R-project.org/package=funFEM, .
[11] Bouveyron, C.; Côme, E.; Jacques, J., The discriminative functional mixture model for a comparative analysis of bike sharing systems, Ann. Appl. Stat., 9, 4, 1726-1760 (2015) · Zbl 1397.62511 · doi:10.1214/15-AOAS861
[12] Bouveyron, C.; Jacques, J., Model-based clustering of time series in group-specific functional subspaces, Advances in Data Analysis and Classification, 5, 4, 281-300 (2011) · Zbl 1274.62416 · doi:10.1007/s11634-011-0095-6
[13] Celeux, G.; Govaert, G., Clustering criteria for discrete data and latent class models, Journal of Classification, 8, 2, 157-176 (1991) · Zbl 0775.62150 · doi:10.1007/BF02616237
[14] Celeux, G.; Govaert, G., Gaussian parsimonious clustering models, Pattern recognition, 28, 5, 781-793 (1995) · doi:10.1016/0031-3203(94)00125-6
[15] Chavent, M.; Kuentz-Simonet, V., Orthogonal rotation in pcamix, Advances in Data Analysis and Classification, 6, 2, 131-146 (2012) · Zbl 1284.62352 · doi:10.1007/s11634-012-0105-3
[16] Chen, K.; Lei, J., Localized functional principal component analysis, J. Amer. Statist. Assoc., 110, 511, 1266-1275 (2015) · Zbl 1373.62293 · doi:10.1080/01621459.2015.1016225
[17] Cox, T., & Cox, M. (2001). Multidimensional Scaling Chapman and Hall. · Zbl 1004.91067
[18] Daudin, J-J; Picard, F.; Robin, S., A mixture model for random graphs, Statistics and Computing, 18, 2, 173-183 (2008) · doi:10.1007/s11222-007-9046-7
[19] Dempster, A.; Laird, N.; Rubin, D., Maximum likelihood from incomplete data via the EM algorithm, Journal of the Royal Statistical Society. Series B (Methodological), 39, 1, 1-38 (1977) · Zbl 0364.62022 · doi:10.1111/j.2517-6161.1977.tb01600.x
[20] Fisher, RA, The use of multiple measurements in taxonomic problems, Annals of eugenics, 7, 2, 179-188 (1936) · doi:10.1111/j.1469-1809.1936.tb02137.x
[21] Gollini, I.; Murphy, T., Mixture of latent trait analyzers for model-based clustering of categorical data, Statistics and Computing, 24, 4, 569-588 (2014) · Zbl 1325.62122 · doi:10.1007/s11222-013-9389-1
[22] Goodman, L., Exploratory latent structure analysis using both identifiable and unidentifiable models, Biometrika, 61, 2, 215-231 (1974) · Zbl 0281.62057 · doi:10.1093/biomet/61.2.215
[23] Greenacre, M. (2017). Correspondence analysis in practice CRC press. · Zbl 1352.62003
[24] Hathaway, RJ, Another interpretation of the em algorithm for mixture distributions, Statistics and Probability Letters, 4, 53-56 (1986) · Zbl 0585.62052 · doi:10.1016/0167-7152(86)90016-7
[25] Hennig, C., Asymmetric linear dimension reduction for classification, Journal of Computational and Graphical Statistics, 13, 4, 930-945 (2004) · doi:10.1198/106186004X12740
[26] Hennig, C., Methods for merging gaussian mixture components, Advances in Data Analysis and Classification, 4, 3-34 (2010) · Zbl 1306.62141 · doi:10.1007/s11634-010-0058-3
[27] Jacques, J.; Preda, C., Model-based clustering for multivariate functional data, Comput. Statist. Data Anal., 71, 92-106 (2014) · Zbl 1471.62096 · doi:10.1016/j.csda.2012.12.004
[28] Jajuga, K.; Sokołowski, A.; Bock, H., Classification, clustering and data analysis: recent advances and applications (2002), Berlin Heidelberg New York: Springer, Berlin Heidelberg New York · doi:10.1007/978-3-642-56181-8
[29] Josse, J.; Chavent, M.; Liquet, B.; Husson, F., Handling missing values with regularized iterative multiple correspondence analysis, Journal of classification, 29, 1, 91-116 (2012) · Zbl 1360.62306 · doi:10.1007/s00357-012-9097-0
[30] Josse, J.; Pagès, J.; Husson, F., Multiple imputation in principal component analysis, Advances in data analysis and classification, 5, 3, 231-246 (2011) · Zbl 1274.62409 · doi:10.1007/s11634-011-0086-7
[31] Kohonen, T., Self-organized formation of topologically correct feature maps, Biological cybernetics, 43, 1, 59-69 (1982) · Zbl 0466.92002 · doi:10.1007/BF00337288
[32] Kosmidis, I., & Karlis, D. (2015). Model-based clustering using copulas with applications Statistics and Computing pp. 1-21 doi:10.1007/s11222-015-9590-5.
[33] Larose, C. (2015). Model-Based Clustering of Incomplete Data, PhD thesis, University of Connecticut. · JFM 27.0320.04
[34] Lê, S.; Josse, J.; Husson, F., Factominer: an R package for multivariate analysis, Journal of statistical software, 25, 1, 1-18 (2008) · doi:10.18637/jss.v025.i01
[35] Lebret, R.; Iovleff, S.; Langrognet, F.; Biernacki, C.; Celeux, G.; Govaert, G., Rmixmod: the R package of the model-based unsupervised, supervised and semi-supervised classification mixmod library, Journal of Statistical Software, 67, 6, 241-270 (2015) · doi:10.18637/jss.v067.i06
[36] Lim, T-S; Loh, W-Y; Shih, Y-S, A comparison of prediction accuracy, complexity, and training time of thirty-three old and new classification algorithms, Machine learning, 4, 3, 203-228 (2000) · Zbl 0969.68669 · doi:10.1023/A:1007608224229
[37] Marbac, M.; Biernacki, C.; Vandewalle, V., Latent class model with conditional dependency per modes to cluster categorical data, Advances in Data Analysis and Classification, 10, 2, 183-207 (2016) · Zbl 1414.62253 · doi:10.1007/s11634-016-0250-1
[38] Marbac, M.; Biernacki, C.; Vandewalle, V., Model-based clustering of Gaussian copulas for mixed data, Communications in Statistics - Theory and Methods, 46, 23, 11635-11656 (2017) · Zbl 1384.62198 · doi:10.1080/03610926.2016.1277753
[39] Mazo, G., A semiparametric and location-shift copula-based mixture model, Journal of Classification, 34, 3, 444-464 (2017) · Zbl 1381.62186 · doi:10.1007/s00357-017-9243-9
[40] McLachlan, G.; Peel, D., Finite mixture models (2004), New York: Wiley, New York · Zbl 0963.62061
[41] McNicholas, P. (2016). Mixture model-based classification CRC Press. · Zbl 1454.62005
[42] McNicholas, P.; Murphy, T., Parsimonious Gaussian mixture models, Statistics and Computing, 18, 3, 285-296 (2008) · doi:10.1007/s11222-008-9056-0
[43] McNicholas, P.; Scrucca, L., Dimension reduction for model-based clustering via mixtures of multivariate t-distributions, Statistics & Probability Letters, 7, 321-338 (2013) · Zbl 1273.62141
[44] McParland, D.; Gormley, IC, Model based clustering for mixed data: clustmd, Advances in Data Analysis and Classification, 10, 2, 155-169 (2016) · Zbl 1414.62254 · doi:10.1007/s11634-016-0238-x
[45] Moustaki, I.; Papageorgiou, I., Latent class models for mixed variables with applications in archaeometry, Computational statistics & data analysis, 48, 3, 659-675 (2005) · Zbl 1430.62254 · doi:10.1016/j.csda.2004.03.001
[46] Punzo, A.; Ingrassia, S., Clustering bivariate mixed-type data via the cluster-weighted model, Computational Statistics, 31, 3, 989-1013 (2016) · Zbl 1347.65030 · doi:10.1007/s00180-015-0600-z
[47] Ramsay, J.O., & Silverman, B.W. (2005). Functional data analysis Springer Series in Statistics, second edn, Springer, New York. · Zbl 1079.62006
[48] Samé, A.; Chamroukhi, F.; Govert, G.; Aknin, P., Model-based clustering and segmentation of time series with changes in regime, Advances in Data Analysis Classification, 5, 301-321 (2011) · Zbl 1274.62427 · doi:10.1007/s11634-011-0096-5
[49] Schlimmer, J. (1987). Concept acquisition through representational adjustment, PhD thesis, Department of Information and Computer Science, University of California.
[50] Schwarz, G., Estimating the dimension of a model, The Annals of Statistics, 6, 2, 461-464 (1978) · Zbl 0379.62005 · doi:10.1214/aos/1176344136
[51] Scrucca, L., Dimension reduction for model-based clustering, Statistics and Computing, 20, 4, 471-484 (2010) · doi:10.1007/s11222-009-9138-7
[52] Scrucca, L.; Fop, M.; Murphy, TB; Raftery, AE, mclust 5: clustering, classification and density estimation using Gaussian finite mixture models, The R Journal, 8, 1, 205-233 (2016) · doi:10.32614/RJ-2016-021
[53] Van der Heijden, P., & Escofier, B. (2003). Multiple correspondence analysis with missing data Analyse des correspondances. Recherches au cżur de l’analyse des donnees pp. 152-170.
[54] Verbanck, M.; Josse, J.; Husson, F., Regularised PCA to denoise and visualise data, Statistics and Computing, 25, 2, 471-486 (2015) · Zbl 1331.62298 · doi:10.1007/s11222-013-9444-y
[55] Vesanto, J.; Alhoniemi, E., Clustering of the self-organizing map, IEEE Transactions on neural networks, 11, 3, 586-600 (2000) · doi:10.1109/72.846731
[56] Xanthopoulos, P., Pardalos, P.M., & Trafalis, T.B. (2013). Linear Discriminant Analysis.
[57] Young, F.W. (1987). Multidimensional scaling: history, theory, and applications Lawrence Erlbaum Associates.
[58] Zanghi, H.; Ambroise, C.; Miele, V., Fast online graph clustering via Erdös-Rényi mixture, Pattern Recognition, 41, 12, 3592-3599 (2008) · Zbl 1151.68623 · doi:10.1016/j.patcog.2008.06.019
[59] Zhou, L.; Pan, H., Principal component analysis of two-dimensional functional data, Journal of Computational and Graphical Statistics, 2, 3, 779-801 (2014) · doi:10.1080/10618600.2013.827986
This reference list is based on information provided by the publisher or from digital mathematics libraries. Its items are heuristically matched to zbMATH identifiers and may contain data conversion errors. In some cases that data have been complemented/enhanced by data from zbMATH Open. This attempts to reflect the references listed in the original paper as accurately as possible without claiming completeness or a perfect matching.