Exponential family mixed membership models for soft clustering of multivariate data. (English) Zbl 1414.62284

Summary: For several years, model-based clustering methods have successfully tackled many of the challenges presented by data-analysts. However, as the scope of data analysis has evolved, some problems may be beyond the standard mixture model framework. One such problem is when observations in a dataset come from overlapping clusters, whereby different clusters will possess similar parameters for multiple variables. In this setting, mixed membership models, a soft clustering approach whereby observations are not restricted to single cluster membership, have proved to be an effective tool. In this paper, a method for fitting mixed membership models to data generated by a member of an exponential family is outlined. The method is applied to count data obtained from an ultra running competition, and compared with a standard mixture model approach.


62H30 Classification and discrimination; cluster analysis (statistical aspects)


Full Text: DOI arXiv


[1] Abramowitz M, Stegun IA (1965) Handbook of mathematical functions: with formulas, graphs, and mathematical tables, 1st edn. Dover Publications, USA · Zbl 0171.38503
[2] Airoldi EM, Blei D, Erosheva E, Fienberg SE (2014) Introduction to mixed membership models and methods. In: Airoldi EM, Blei D, Erosheva E, Fienberg SE (eds) Handbook of mixed membership models, Chap. 1. Chapman & Hall/CRC, Boca Raton
[3] Airoldi EM, Fienberg SE, Joutard C, Love T (2006) Discovering latent patterns with hierarchical Bayesian mixed-membership models. Technical report, Carnegie Mellon University, School of Computer Science, Machine Learning Department. Report no CMU-06-101. http://ra.adm.cs.cmu.edu/anon/ml/CMU-ML-06-101.pdf
[4] Airoldi EM, Fienberg SE, Joutard C, Love T (2007) Discovering latent patterns with hierarchical Bayesian mixed-membership models. In: Poncelet P, Teisseire M, Masseglia F (eds) Data mining patterns: New methods and applications, Chap. 11. Idea Group Inc., Calgary
[5] Baudry, JP; Raftery, AE; Celeux, G.; Lo, K.; Gottardo, R., Combining mixture components for clustering, J Comput Gr Stat, 19, 332-353, (2010)
[6] Beal M (2003) Variational algorithms for approximate Bayesian inference. Ph.D. dissertion. University College London
[7] Bensmail, H.; Celeux, G.; Raftery, AE; Robert, C., Inference in model-based cluster analysis, Stat Comput, 7, 1-10, (1997)
[8] Biernacki, C.; Celeux, G.; Govaert, G., Assessing a mixture model for clustering with the integrated completed likelihood, Pattern Anal Mach Intell IEEE Trans, 22, 719-725, (2000)
[9] Bishop CM (2006) Pattern recognition and machine learning. Springer, Secaucus · Zbl 1107.68072
[10] Blei DM, Lafferty JD (2006) Dynamic topic models. In: Cohen W, Moore A (eds) Proceedings of the 23rd international machine learning conference. http://icml.cc/2016/awards/dtm.pdf. http://dl.acm.org/citation.cfm?id=1143859
[11] Blei, DM; Lafferty, JD, A correlated topic model of science, Ann Appl Stat, 1, 17-35, (2007) · Zbl 1129.62122
[12] Blei, DM; Ng, AY; Jordan, MI, Latent Dirichlet allocation, J Mach Learn Res, 3, 993-1022, (2003) · Zbl 1112.68379
[13] Dempster, AP; Laird, NM; Rubin, DB, Maximum likelihood from incomplete data via the EM Algorithm, J R Stat Soc Ser B (Methodol), 39, 1-38, (1977) · Zbl 0364.62022
[14] Erosheva, EA; Fienberg, SE; Joutard, C., Describing disability through individual-level mixture models for multivariate binary data, Ann Appl Stat, 1, 502-537, (2007) · Zbl 1126.62101
[15] Erosheva, EA; Fienberg, SE; Lafferty, J., Mixed-membership models of scientific publications, Proc Natl Acad Sci USA, 101, 5220-5227, (2004)
[16] Everitt BS, Hand DJ (1981) Finite mixture distributions. Chapman and Hall, London · Zbl 0466.62018
[17] Fraley, C.; Raftery, AE, Model-based clustering, discriminant analysis, and density estimation, J Am Stat Assoc, 97, 611-631, (2002) · Zbl 1073.62545
[18] Galyardt A (2014) Interpreting mixed membership models: Implications of Erosheva’s representation theorem. In: Airoldi EM, Blei D, Erosheva E, Fienberg SE (eds) Handbook of mixed membership models, Chap. 11. Chapman & Hall/CRC, London
[19] Gormley, C.; Murphy, TB, A grade of membership model for rank data, Bayesian Anal, 4, 265-296, (2009) · Zbl 1330.62024
[20] Hill, MO, Diversity and evenness: a unifying notation and its consequences, Ecology, 54, 427-432, (1973)
[21] Manrique-Vallier, D., Longitudinal mixed membership trajectory models for disability survey data, Ann Appl Stat, 8, 2268-2291, (2014) · Zbl 1454.62502
[22] McLachlan G, Peel D (2002). Finite mixture models. Wiley, New York · Zbl 0963.62061
[23] Ormerod, JT; Wand, MP, Explaining variational approximations, Am Stat, 64, 140-153, (2010) · Zbl 1200.65007
[24] Rogers, S.; Girolami, M.; Campbell, C.; Breitling, R., The latent process decomposition of cDNA microarray datasets, IEEE/ACM Trans Comput Biol Bioinf, 2, 2005, (2005)
[25] Schwarz, G., Estimating the dimension of a model, Ann Stat, 6, 461-464, (1978) · Zbl 0379.62005
[26] van den Boogaart KG, Tolosana-Delgado R (2008) Compositions: A unified r package to analyze compositional data. Comput Geosci 34(4):320-338
[27] Vermunt JK, Magidson J (2002) Latent class cluster analysis. In: Hagenaars JA, McCutcheon A (eds) Applied latent class analysis. Cambridge University Press, Cambridge, pp 89-106
[28] Wang, C.; Blei, D., Variational inference in nonconjugate models, J Mach Learn Res, 14, 1005-1031, (2013) · Zbl 1320.62057
[29] White A, Chan J, Hayes C, Murphy TB (2012) Mixed membership models for exploring user roles in online fora. In: Ellison N, Shanahan JG, Tufekci Z (eds) Proceedings of the sixth international AAAI conference on weblogs and social media (ICWSM 2012), pp 599-602. http://www.aaai.org/ocs/index.php/ICWSM/ICWSM12/paper/view/4638
This reference list is based on information provided by the publisher or from digital mathematics libraries. Its items are heuristically matched to zbMATH identifiers and may contain data conversion errors. It attempts to reflect the references listed in the original paper as accurately as possible without claiming the completeness or perfect precision of the matching.