×

Variable selection methods for model-based clustering. (English) Zbl 1496.62105

Summary: Model-based clustering is a popular approach for clustering multivariate data which has seen applications in numerous fields. Nowadays, high-dimensional data are more and more common and the model-based clustering approach has adapted to deal with the increasing dimensionality. In particular, the development of variable selection techniques has received a lot of attention and research effort in recent years. Even for small size problems, variable selection has been advocated to facilitate the interpretation of the clustering results. This review provides a summary of the methods developed for variable selection in model-based clustering. Existing R packages implementing the different methods are indicated and illustrated in application to two data analysis examples.

MSC:

62H30 Classification and discrimination; cluster analysis (statistical aspects)
62-08 Computational methods for problems pertaining to statistics
PDFBibTeX XMLCite
Full Text: DOI arXiv Euclid

References:

[1] Agresti, A. (2002). Categorical Data Analysis. Wiley. · Zbl 1018.62002
[2] Andrews, J. L. and McNicholas, P. D. (2013). vscc: Variable selection for clustering and classification R package version 0.2, https://cran.r-project.org/package=vscc.
[3] Andrews, J. L. and McNicholas, P. D. (2014). Variable selection for clustering and classification. Journal of Classification31 136-153. · Zbl 1360.62310 · doi:10.1007/s00357-013-9139-2
[4] Badsberg, J. H. (1992). Model search in contingency tables by CoCo. In Computational Statistics (Y. Dodge and J. Whittaker, eds.) Vol. 1, 251-256. Heidelberg: Physica Verlag.
[5] Banfield, J. D. and Raftery, A. E. (1993). Model-based Gaussian and non-Gaussian clustering. Biometrics49 803-821. · Zbl 0794.62034 · doi:10.2307/2532201
[6] Bartholomew, D., Knott, M. and Moustaki, I. (2011). Latent Variable Models and Factor Analysis. Wiley. · Zbl 1266.62040
[7] Bartolucci, F., Montanari, G. E. and Pandolfi, S. (2016). Item selection by latent class-based methods: an application to nursing home evaluation. Advances in Data Analysis and Classification10 245-262. · Zbl 1414.62508
[8] Bartolucci, F., Montanari, G. E. and Pandolfi, S. (2017). Latent ignorability and item selection for nursing home case-mix evaluation. Journal of Classification. · Zbl 1391.62200 · doi:10.1007/s00357-017-9227-9
[9] Baudry, J.-P., Maugis, C. and Michel, B. (2012). Slope heuristics: overview and implementation. Statistics and Computing22 455-470. · Zbl 1322.62007 · doi:10.1007/s11222-011-9236-1
[10] Bellman, R. (1957). Dynamic Programming. Princeton University Press. · Zbl 0077.13605
[11] Benaglia, T., Chauveau, D., Hunter, D. R. and Young, D. (2009). mixtools: An R package for analyzing finite mixture models. Journal of Statistical Software32 1-29.
[12] Bhattacharya, S. and McNicholas, P. D. (2014). A LASSO-penalized BIC for mixture model selection. Advances in Data Analysis and Classification8 45-61. · Zbl 1474.62212
[13] Biernacki, C., Celeux, G. and Govaert, G. (2000). Assessing a mixture model for clustering with the integrated completed likelihood. IEEE transactions on pattern analysis and machine intelligence22 719-725.
[14] Biernacki, C. and Lourme, A. (2014). Stable and visualizable Gaussian parsimonious clustering models. Statistics and Computing24 953-969. · Zbl 1332.62199 · doi:10.1007/s11222-013-9413-5
[15] Blum, A. L. and Langley, P. (1997). Selection of relevant features and examples in machine learning. Artificial Intelligence97 245-271. · Zbl 0904.68142 · doi:10.1016/S0004-3702(97)00063-5
[16] Bontemps, D. and Toussile, W. (2013). Clustering and variable selection for categorical multivariate data. Electronic Journal of Statistics7 2344-2371. · Zbl 1349.62259 · doi:10.1214/13-EJS844
[17] Bouveyron, C. and Brunet, C. (2012). Simultaneous model-based clustering and visualization in the Fisher discriminative subspace. Statistics and Computing22 301-324. · Zbl 1322.62162 · doi:10.1007/s11222-011-9249-9
[18] Bouveyron, C. and Brunet-Saumard, C. (2014a). Model-based clustering of high-dimensional data: A review. Computational Statistics & Data Analysis71 52-78. · Zbl 1306.65033 · doi:10.1007/s00180-013-0433-6
[19] Bouveyron, C. and Brunet-Saumard, C. (2014b). Discriminative variable selection for clustering with the sparse Fisher-EM algorithm. Computational Statistics29 489-513. · Zbl 1306.65033 · doi:10.1007/s00180-013-0433-6
[20] Bouveyron, C., Girard, S. and Schmid, C. (2007). High-dimensional data clustering. Computational Statistics & Data Analysis52 502-519. · Zbl 1452.62433 · doi:10.1016/j.csda.2007.02.009
[21] Celeux, G. and Govaert, G. (1995). Gaussian parsimonious clustering models. Pattern Recognition28 781-793. · Zbl 0775.62150
[22] Celeux, G., Martin-Magniette, M. L., Maugis-Rabusseau, C. and Raftery, A. E. (2014). Comparing model selection and regularization approaches to variable selection in model-based clustering. Journal de la Société Française de Statistique155 57-71. · Zbl 1316.62083
[23] Celeux, G., Maugis-Rabusseau, C. and Sedki, M. (2018). Variable selection in model-based clustering and discriminant analysis with a regularization approach. Advances in Data Analysis and Classification. · Zbl 1474.62216
[24] Chang, W.-C. (1983). On Using Principal Components Before Separating a Mixture of Two Multivariate Normal Distributions. Journal of the Royal Statistical Society. Series C (Applied Statistics)32 267-275. · Zbl 0538.62050
[25] Chavent, M., Kuentz-Simonet, V., Liquet, B. and Saracco, J. (2012). ClustOfVar: An R package for the clustering of variables. Journal of Statistical Software, Articles50 1-16.
[26] Chen, W.-C. and Maitra, R. (2015). EMCluster: EM Algorithm for model-based clustering of finite mixture Gaussian distribution. R Package, URL http://cran.r-project.org/package=EMCluster.
[27] Chen, L. S., Prentice, R. L. and Wang, P. (2014). A penalized EM algorithm incorporating missing data mechanism for Gaussian parameter estimation. Biometrics70 312-322. · Zbl 1419.62321 · doi:10.1111/biom.12149
[28] Clark, S. J. and Sharrow, D. J. (2011). Contemporary model life tables for developed countries: An application of model-based clustering. Working Paper, Center for Statistics and the Social Sciences, University of Washington.
[29] Clogg, C. C. (1988). Latent class models for measuring. In Latent Trait and Latent Class Models (R. Langeheine and J. Rost, eds.) 8, 173-205. Plenum Press.
[30] Collins, L. M. and Lanza, S. T. (2010). Latent Class and Latent Transition Analysis. Wiley.
[31] Constantinopoulos, C., Titsias, M. K. and Likas, A. (2006). Bayesian feature and model selection for Gaussian mixture models. IEEE Transactions on Pattern Analysis and Machine Intelligence28 1013-1018.
[32] Dash, M. and Liu, H. (1997). Feature selection for classification. Intelligent data analysis1 131-156.
[33] Dean, N. and Raftery, A. E. (2010). Latent class analysis variable selection. Annals of the Institute of Statistical Mathematics62 11-35. · Zbl 1422.62085 · doi:10.1007/s10463-009-0258-9
[34] Dempster, A. P., Laird, N. M. and Rubin, D. B. (1977). Maximum likelihood from incomplete data via the EM algorithm. Journal of the Royal Statistical Society, Series B39 1-38. · Zbl 1422.62085
[35] DeSantis, S. M., Houseman, E. A., Coull, B. A., Stemmer-Rachamimov, A. and Betensky, R. A. (2008). A penalized latent class model for ordinal data. Biostatistics9 249. · Zbl 0364.62022 · doi:10.1093/biostatistics/kxm026
[36] Dy, J. G. and Brodley, C. E. (2004). Feature selection for unsupervised learning. Journal of Machine Learning Research5 845-889. · Zbl 1143.62061
[37] Fop, M. and Murphy, T. B. (2017). LCAvarsel: Variable selection for latent class analysis R package version 1.1, https://cran.r-project.org/package=LCAvarsel. · Zbl 1222.68187
[38] Fop, M., Smart, K. M. and Murphy, T. B. (2017). Variable selection for latent class analysis with application to low back pain diagnosis. Annals of Applied Statistics11 2085-2115. · Zbl 1383.62268 · doi:10.1214/17-AOAS1061
[39] Formann, A. K. (1985). Constrained latent class models: Theory and applications. British Journal of Mathematical and Statistical Psychology38 87-111. · Zbl 1383.62268 · doi:10.1111/j.2044-8317.1985.tb00818.x
[40] Formann, A. K. (2007). Mixture analysis of multivariate categorical data with covariates and missing entries. Computational Statistics & Data Analysis51 5236-5246. · Zbl 0585.62182 · doi:10.1016/j.csda.2006.08.020
[41] Fowlkes, E. B., Gnanadesikan, R. and Kettenring, J. R. (1988). Variable selection in clustering. Journal of Classification5 205-228. · Zbl 1445.62266
[42] Fraley, C. and Raftery, A. E. (2002). Model-based clustering, discriminant analysis and density estimation. Journal of the American Statistical Association97 611-631. · Zbl 1073.62545 · doi:10.1198/016214502760047131
[43] Friedman, J. H. and Meulman, J. J. (2004). Clustering objects on subsets of attributes (with discussion). Journal of the Royal Statistical Society: Series B (Statistical Methodology)66 815-849. · Zbl 1073.62545 · doi:10.1111/j.1467-9868.2004.02059.x
[44] Galimberti, G., Manisi, A. and Soffritti, G. (2017). Modelling the role of variables in model-based cluster analysis. Statistics and Computing 1-25. · Zbl 1060.62064 · doi:10.1007/s11222-017-9723-0
[45] Galimberti, G., Montanari, A. and Viroli, C. (2009). Penalized factor mixture analysis for variable selection in clustered data. Computational Statistics & Data Analysis53 4301-4310. · Zbl 1384.62195 · doi:10.1016/j.csda.2009.05.025
[46] Gollini, I. and Murphy, T. B. (2014). Mixture of latent trait analyzers for model-based clustering of categorical data. Statistics and Computing24 569-588. · Zbl 1453.62094 · doi:10.1007/s11222-013-9389-1
[47] Golub, T. R., Slonim, D. K., Tamayo, P., Huard, C., Gaasenbeek, M., Mesirov, J. P., Coller, H., Loh, M. L., Downing, J. R., Caligiuri, M. A. and Bloomfield, C. D. (1999). Molecular classification of cancer: class discovery and class prediction by gene expression monitoring. Science286 531-537. · Zbl 1325.62122
[48] Goodman, L. A. (1974). Exploratory latent structure analysis using both identifiable and unidentifiable models. Biometrika61 215-231. · Zbl 0281.62057 · doi:10.1093/biomet/61.2.215
[49] Green, P. J. (1990). On use of the EM for penalized likelihood estimation. Journal of the Royal Statistical Society. Series B (Methodological)52 443-452. · Zbl 0281.62057
[50] Guo, J., Levina, E., Michailidis, G. and Zhu, J. (2010). Pairwise variable selection for high-dimensional model-based clustering. Biometrics66 793-804. · Zbl 0706.62022 · doi:10.1111/j.1541-0420.2009.01341.x
[51] Guyon, I. and Elisseeff, A. (2003). An introduction to variable and feature selection. Journal of Machine Learning Research3 1157-1182. · Zbl 1203.62190
[52] Hoff, P. D. (2005). Subset clustering of binary sequences, with an application to genomic abnormality data. Biometrics61 1027-1036. · Zbl 1102.68556 · doi:10.1111/j.1541-0420.2005.00381.x
[53] Hoff, P. D. (2006). Model-based subspace clustering. Bayesian Analysis1 321-344. · Zbl 1087.62125 · doi:10.1214/06-BA111
[54] Houseman, E. A., Coull, B. A. and Betensky, R. A. (2006). Feature-specific penalized latent class analysis for genomic data. Biometrics62 1062-1070. · Zbl 1331.62309 · doi:10.1111/j.1541-0420.2006.00566.x
[55] Hubert, L. and Arabie, P. (1985). Comparing partitions. Journal of Classification2 193-218. · Zbl 1116.62120
[56] Human Mortality Database (2017). University of California, Berkeley (USA), and Max Planck Institute for Demographic Research (Germany). http://www.mortality.org/. · Zbl 0587.62128
[57] Hunt, L. and Jorgensen, M. (2003). Mixture model clustering for mixed data with missing information. Computational Statistics & Data Analysis41 429-440. · Zbl 1256.62037 · doi:10.1016/S0167-9473(02)00190-1
[58] John, G. H., Kohavi, R. and Pfleger, K. (1994). Irrelevant features and the subset selection problem. In Proceedings of the Eleventh International Conference on Machine Learning 121-129. · Zbl 1256.62037
[59] Karlis, D. and Meligkotsidou, L. (2007). Finite mixtures of multivariate Poisson distributions with application. Journal of Statistical Planning and Inference137 1942-1960. · Zbl 1116.60006 · doi:10.1016/j.jspi.2006.07.001
[60] Kim, S., Tadesse, M. G. and Vannucci, M. (2006). Variable selection in clustering via Dirichlet process mixture models. Biometrika93 877-893. · Zbl 1116.60006 · doi:10.1093/biomet/93.4.877
[61] Kohavi, R. and John, G. H. (1997). Wrappers for feature subset selection. Artificial Intelligence97 273-324. · Zbl 1436.62266 · doi:10.1016/S0004-3702(97)00043-X
[62] Koller, D. and Sahami, M. (1996). Toward optimal feature selection. In Proceedings of the Thirteenth International Conference on Machine Learning (ICML) (L. Saitta, ed.) 284-292. Morgan Kaufmann Publishers. · Zbl 0904.68143
[63] Kosmidis, I. and Karlis, D. (2016). Model-based clustering using copulas with applications. Statistics and Computing26 1079-1099. · Zbl 1505.62233 · doi:10.1007/s11222-015-9590-5
[64] Law, M. H. C., Figueiredo, M. A. T. and Jain, A. K. (2004). Simultaneous feature selection and clustering using mixture models. Pattern Analysis and Machine Intelligence, IEEE Transactions on26 1154-1166. · Zbl 1505.62233
[65] Lebret, R., Iovleff, S., Langrognet, F., Biernacki, C., Celeux, G. and Govaert, G. (2015). Rmixmod: The R package of the model-based unsupervised, supervised, and semi-supervised classification mixmod library. Journal of Statistical Software, Articles67 1-29.
[66] Lee, H. and Li, J. (2012). Variable selection for clustering by separability based on ridgelines. Journal of Computational and Graphical Statistics21 315-336.
[67] Lee, S. X. and McLachlan, G. J. (2013). On mixtures of skew normal and skew t-distributions. Advances in Data Analysis and Classification7 241-266. · Zbl 1273.62115 · doi:10.1007/s11634-013-0132-8
[68] Lee, S. X. and McLachlan, G. J. (2016). Finite mixtures of canonical fundamental skew t-distributions: The unification of the restricted and unrestricted skew t-mixture models. Statistics and Computing26 573-589. · Zbl 1273.62115 · doi:10.1007/s11222-015-9545-x
[69] Leisch, F. (2004). FlexMix: A general framework for finite mixture models and latent class regression in R. Journal of Statistical Software, Articles11 1-18. · Zbl 1420.60020
[70] Linzer, D. A. and Lewis, J. B. (2011). poLCA: An R package for polytomous variable latent class analysis. Journal of Statistical Software42 1-29.
[71] Little, R. J. A. and Rubin, D. B. (2002). Statistical Analysis with Missing Data. Wiley. · Zbl 1011.62004
[72] Liu, H. and Motoda, H. (2007). Computational Methods of Feature Selection. CRC Press. · Zbl 1011.62004
[73] Liu, J. S., Zhang, J. L., Palumbo, M. J. and Lawrence, C. E. (2003). Bayesian clustering with variable and transformation selections. In Bayesian Statistics 7: Proceedings of the Seventh Valencia International Meeting (J. M. Bernardo, M. J. Bayarri, J. O. Berger, A. P. Dawid, D. Heckerman, A. F. M. Smith and M. West, eds.) 249-275. Oxford University Press. · Zbl 1130.62118
[74] Malsiner-Walli, G., Frühwirth-Schnatter, S. and Grün, B. (2016). Model-based clustering based on sparse finite Gaussian mixtures. Statistics and Computing26 303-324. · Zbl 1342.62109 · doi:10.1007/s11222-014-9500-2
[75] Marbac, M., Biernacki, C. and Vandewalle, V. (2015). Model-based clustering for conditionally correlated categorical data. Journal of Classification32 145-175. · Zbl 1342.62109 · doi:10.1007/s00357-015-9180-4
[76] Marbac, M. and Sedki, M. (2017a). Variable selection for model-based clustering using the integrated complete-data likelihood. Statistics and Computing27 1049-1063. · Zbl 1335.62103 · doi:10.1007/s11222-016-9670-1
[77] Marbac, M. and Sedki, M. (2017b). VarSelLCM: Variable selection for model-based clustering of continuous, count, categorical or mixed-type data set with missing values R package version 2.0.1, https://CRAN.R-project.org/package=VarSelLCM. · Zbl 1384.62199 · doi:10.1007/s11222-016-9670-1
[78] Marbac, M. and Sedki, M. (2017c). Variable selection for mixed data clustering: A model-based approach. arXiv:1703.02293. · Zbl 1384.62199 · doi:10.1007/s11222-016-9670-1
[79] Maugis, C., Celeux, G. and Martin-Magniette, M. L. (2009a). Variable selection for clustering with Gaussian mixture models. Biometrics65 701-709. · Zbl 1384.62199 · doi:10.1111/j.1541-0420.2008.01160.x
[80] Maugis, C., Celeux, G. and Martin-Magniette, M. L. (2009b). Variable selection in model-based clustering: A general variable role modeling. Computational Statistics and Data Analysis53 3872-3882. · Zbl 1172.62021 · doi:10.1016/j.csda.2009.04.013
[81] Maugis-Rabusseau, C., Martin-Magniette, M.-L. and Pelletier, S. (2012). SelvarClustMV: Variable selection approach in model-based clustering allowing for missing values. Journal de la Société Française de Statistique153 21-36. · Zbl 1453.62154
[82] McLachlan, G. J. and Basford, K. E. (1988). Mixture models: Inference and applications to clustering. Marcel Dekker. · Zbl 1316.62092
[83] McLachlan, G. J., Bean, R. W. and Peel, D. (2002). A mixture model-based approach to the clustering of microarray expression data. Bioinformatics18 413-422. · Zbl 0697.62050
[84] McLachlan, G. and Krishnan, T. (2008). The EM Algorithm and Extensions. Wiley. · Zbl 1165.62019
[85] McLachlan, G. J. and Peel, D. (1998). Robust cluster analysis via mixtures of multivariate t-distributions In Advances in Pattern Recognition: Joint IAPR International Workshops SSPR’98 and SPR’98 Sydney, Australia, August 11-13, 1998 Proceedings 658-666. Springer Berlin Heidelberg. · Zbl 1165.62019
[86] McLachlan, G. and Peel, D. (2000). Finite Mixture Models. Wiley. · Zbl 0963.62061
[87] McLachlan, G. J. and Rathnayake, S. (2014). On the number of components in a Gaussian mixture model. Wiley Interdisciplinary Reviews: Data Mining and Knowledge Discovery4 341-355. · Zbl 0963.62061
[88] McLachlan, G. J., Peel, D., Basford, K. E. and P., A. (1999). The EMMIX software for the fitting of mixtures of normal and t-components. Journal of Statistical Software4 1-14.
[89] McNicholas, P. D. (2016). Model-based clustering. Journal of Classification33 331-373. · Zbl 1364.62155 · doi:10.1007/s00357-016-9211-9
[90] McNicholas, D. P. and Murphy, T. B. (2008). Parsimonious Gaussian mixture models. Statistics and Computing18 285-296. · Zbl 1364.62155
[91] McParland, D. and Gormley, I. C. (2016). Model based clustering for mixed data: clustMD. Advances in Data Analysis and Classification10 155-169. · Zbl 1414.62254
[92] Melnykov, V. and Maitra, R. (2010). Finite mixture models and model-based clustering. 4 80-116. · Zbl 1190.62121 · doi:10.1214/09-SS053
[93] Meyer, D., Dimitriadou, E., Hornik, K., Weingessel, A. and Leisch, F. (2017). e1071: Misc functions of the Department of Statistics, Probability Theory Group (formerly: E1071), TU Wien R package version 1.6-8, https://CRAN.R-project.org/package=e1071. · Zbl 1190.62121
[94] Miller, A. (2002). Subset selection in regression. CRC Press. · Zbl 1051.62060
[95] Nia, V. P. and Davison, A. C. (2012). High-dimensional Bayesian clustering with variable selection: The R package bclust. Journal of Statistical Software47 1-22. · Zbl 1051.62060
[96] Nia, V. P. and Davison, A. C. (2015). A simple model-based approach to variable selection in classification and clustering. Canadian Journal of Statistics43 157-175. · Zbl 1328.62388 · doi:10.1002/cjs.11241
[97] O’Hagan, A., Murphy, T. B. and Gormley, I. C. (2012). Computational aspects of fitting mixture models via the expectation-maximization algorithm. Computational Statistics & Data Analysis56 3843-3864. · Zbl 1328.62388 · doi:10.1016/j.csda.2012.05.011
[98] Pan, W. and Shen, X. (2007). Penalized model-based clustering with application to variable selection. Journal of Machine Learning Research8 1145-1164. · Zbl 1255.62180
[99] Python Software Foundation (2017). Python: A dynamic, open source programming language https://www.python.org/. · Zbl 1222.68279
[100] R Core Team (2017). R: A Language and Environment for Statistical Computing R Foundation for Statistical Computing, Vienna, Austria https://www.R-project.org/.
[101] Raftery, A. E. and Dean, N. (2006). Variable selection for model-based clustering. Journal of the American Statistical Association101 168-178. · Zbl 1118.62339 · doi:10.1198/016214506000000113
[102] Rau, A., Maugis-Rabusseau, C., Martin-Magniette, M.-L. and Celeux, G. (2015). Co-expression analysis of high-throughput transcriptome sequencing data with Poisson mixture models. Bioinformatics31 1420-1427. · Zbl 1118.62339
[103] Ritter, G. (2014). Robust cluster analysis and variable selection. CRC Press. · Zbl 1341.62037
[104] Saeys, Y., Inza, I. n. and Larrañaga, P. (2007). A review of feature selection techniques in bioinformatics. Bioinformatics23 2507-2517. · Zbl 1341.62037
[105] Schwarz, G. (1978). Estimating the dimension of a model. Annals of Statistics6 461-464. · Zbl 0379.62005 · doi:10.1214/aos/1176344136
[106] Scrucca, L. (2016). Genetic algorithms for subset selection in model-based clustering. In Unsupervised Learning Algorithms (M. E. Celebi and K. Aydin, eds.) 55-70. Springer. · Zbl 0379.62005
[107] Scrucca, L. and Raftery, A. E. (2018). clustvarsel: A package implementing variable selection for Gaussian model-based clustering in R. Journal of Statistical Software84 1-28.
[108] Scrucca, L., Fop, M., Murphy, T. B. and Raftery, A. E. (2016). mclust 5: Clustering, classification and density estimation using Gaussian finite mixture models. The R Journal8 289-317.
[109] Sedki, M., Celeux, G. and Maugis, C. (2014). SelvarMix: A R package for variable selection in model-based clustering and discriminant analysis with a regularization approach. INRIA Techical report. · Zbl 1474.62216
[110] Sedki, M., Celeux, G. and Maugis-Rabusseau, C. (2017). SelvarMix: Regularization for variable selection in model-based clustering and discriminant analysis R package version 1.2.1, https://CRAN.R-project.org/package=SelvarMix. · Zbl 1474.62216
[111] Silvestre, C., Cardoso, M. G. M. S. and Figueiredo, M. (2015). Feature selection for clustering categorical data with an embedded modelling approach. Expert Systems32 444-453.
[112] Steinley, D. and Brusco, M. J. (2008). Selection of variables in cluster analysis: An empirical comparison of eight procedures. Psychometrika73 125-144. · Zbl 1143.62327 · doi:10.1007/s11336-007-9019-y
[113] Sun, W., Wang, J. and Fang, Y. (2012). Regularized k-means clustering of high-dimensional data and its asymptotic consistency. Electronic Journal of Statistics6 148-167. · Zbl 1143.62327 · doi:10.1214/12-EJS668
[114] Swartz, M. D., Mo, Q., Murphy, M. E., Lupton, J. R., Turner, N. D., Hong, M. Y. and Vannucci, M. (2008). Bayesian variable selection in clustering high-dimensional data with substructure. Journal of Agricultural, Biological, and Environmental Statistics13 407. · Zbl 1335.62109 · doi:10.1198/108571108X378317
[115] Tadesse, M. G., Sha, N. and Vannucci, M. (2005). Bayesian variable selection in clustering high-dimensional data. Journal of the American Statistical Association100 602-617. · Zbl 1306.62349 · doi:10.1198/016214504000001565
[116] Toussile, W. and Gassiat, E. (2009). Variable selection in model-based clustering using multilocus genotype data. Advances in Data Analysis and Classification3 109-134. · Zbl 1117.62433 · doi:10.1007/s11634-009-0043-x
[117] Vermunt, J. K. and Magdison, J. (2002). Latent class cluster analysis. In Applied Latent Class Analysis (J. A. Hagenaars and A. L. McCutcheon, eds.) 3, 89-106. Cambridge University Press. · Zbl 1003.00021
[118] Wallace, C. S. and Freeman, P. R. (1987). Estimation and inference by compact coding. Journal of the Royal Statistical Society. Series B (Methodological)49 240-265. · Zbl 1284.62397
[119] Wallace, M. L., Buysse, D. J., Germain, A., Hall, M. H. and Iyengar, S. (2017). Variable selection for skewed model-based clustering: Application to the identification of novel sleep phenotypes. Journal of the American Statistical Association0 0-0. · Zbl 1398.62347
[120] Wang, S. and Zhu, J. (2008). Variable selection for model-based high-dimensional clustering and its application to microarray data. Biometrics64 440-448. · Zbl 0653.62005 · doi:10.1111/j.1541-0420.2007.00922.x
[121] White, A. and Murphy, T. B. (2014). BayesLCA: An R package for Bayesian latent class analysis. Journal of Statistical Software61 1-28.
[122] White, A., Wyse, J. and Murphy, T. B. (2016). Bayesian variable selection for latent class analysis using a collapsed Gibbs sampler. Statistics and Computing26 511-527. · Zbl 1137.62041 · doi:10.1007/s11222-014-9542-5
[123] Witten, D. M. and Tibshirani, R. (2010). A framework for feature selection in clustering. Journal of the American Statistical Association105 713-726. · Zbl 1392.62194 · doi:10.1198/jasa.2010.tm09415
[124] Witten, D. M. and Tibshirani, R. (2013). sparcl: Perform sparse hierarchical clustering and sparse k-means clustering R package version 1.0.3, https://CRAN.R-project.org/package=sparcl. · Zbl 1342.62112
[125] Wu, B. (2013). Sparse cluster analysis of large-scale discrete variables with application to single nucleotide polymorphism data. Journal of Applied Statistics40 358-367. · Zbl 1392.62194
[126] Xie, B., Pan, W. and Shen, X. (2008a). Variable selection in penalized model-based clustering via regularization on grouped parameters. Biometrics64 921-930. · Zbl 1146.62101 · doi:10.1111/j.1541-0420.2007.00955.x
[127] Xie, B., Pan, W. and Shen, X. (2008b). Penalized model-based clustering with cluster-specific diagonal covariance matrices and grouped variables. Electronic Journal of Statistics2 168-212. · Zbl 1135.62055 · doi:10.1214/08-EJS194
[128] Xie, B., Pan, W. and Shen, X. (2010). Penalized mixtures of factor analyzers with application to clustering high-dimensional microarray data. Bioinformatics26 501. · Zbl 1146.62101
[129] Yu, L. and Liu, H. (2004). Efficient feature selection via analysis of relevance and redundancy. Journal of Machine Learning Research5 1205-1224. · Zbl 1135.62055
[130] Zhang, Q. and Ip, E. H. (2014). Variable assessment in latent class models. Computational Statistics & Data Analysis77 146-156. · Zbl 1506.62206
[131] Zhou, H., Pan, W. and Shen, X. (2009). Penalized model-based clustering with unconstrained covariance matrices. Electronic Journal of Statistics3 1473-1496. · Zbl 1222.68340 · doi:10.1214/09-EJS487
[132] Zhang, Q. and Ip, E. H. (2014). Variable assessment in latent class models. Computational Statistics & Data Analysis 77 146-156. · Zbl 1506.62206
[133] Zhou, H., Pan, W. and Shen, X. (2009). Penalized model-based clustering with unconstrained covariance matrices. Electronic Journal of Statistics 3 1473-1496. · Zbl 1326.62143 · doi:10.1214/09-EJS487
This reference list is based on information provided by the publisher or from digital mathematics libraries. Its items are heuristically matched to zbMATH identifiers and may contain data conversion errors. In some cases that data have been complemented/enhanced by data from zbMATH Open. This attempts to reflect the references listed in the original paper as accurately as possible without claiming completeness or a perfect matching.