The determination of uncertainty levels in robust clustering of subjects with longitudinal observations using the Dirichlet process mixture. (English) Zbl 1414.62268

Summary: In this paper we introduce a new method to the cluster analysis of longitudinal data focusing on the determination of uncertainty levels for cluster memberships. The method uses the Dirichlet-\(t\) distribution which notably utilizes the robustness feature of the student-\(t\) distribution in the framework of a Bayesian semi-parametric approach together with robust clustering of subjects evaluates the uncertainty level of subjects memberships to their clusters. We let the number of clusters and the uncertainty levels be unknown while fitting Dirichlet process mixture models. Two simulation studies are conducted to demonstrate the proposed methodology. The method is applied to cluster a real data set taken from gene expression studies.


62H30 Classification and discrimination; cluster analysis (statistical aspects)
62J05 Linear regression; mixed models
62G08 Nonparametric regression and quantile regression
62M99 Inference from stochastic processes


Full Text: DOI


[1] Andrews, JL; McNicholas, PD, Extending mixtures of multivariate \(t\)-factor analyzers, Stat Comput, 21, 361-373, (2011) · Zbl 1255.62171
[2] Andrews, JL; McNicholas, PD, Mixtures of modified \(t\)-factor analyzers for model-based clustering, classification, and discriminant analysis, J Stat Plan Inference, 141, 1479-1486, (2011) · Zbl 1204.62098
[3] Baek, J.; McLachlan, GJ, Mixtures of common \(t\)-factor analyzers for clustering high-dimensional microarray data, Bioinformatics, 27, 1269-1276, (2011)
[4] Bai, X.; Chen, K.; Yao, W., Mixture of linear mixed models using multivariate t distribution, J Stat Comput Simul, 86, 771-787, (2016)
[5] Chen, L.; Brown, SD, Bayesian estimation of membership uncertainty in model-based clustering, J Chemometr, 28, 358-369, (2014)
[6] Chu, S.; DeRisi, J.; Eisen, M.; Mulholland, J.; Botstein, D.; Brown, P.; Herskowitz, I., The transcriptional program of sporulation in budding yeast, Science, 282, 699-705, (1998)
[7] Damien, P.; Wakefield, J.; Walker, S., Gibbs sampling for Bayesian non-conjugate and hierarchical models by using auxiliary variables, J R Stat Soc B, 61, 331-344, (1999) · Zbl 0913.62028
[8] Dempster, AP; Laird, NM; Rubin, DB, Maximum likelihood from incomplete data via the EM algorithm (with discussion), J R Stat Soc B, 39, 1-38, (1977) · Zbl 0364.62022
[9] Dorazio, RM, On selecting a prior for the precision parameter of Dirichlet process mixture models, J Stat Plan Inference, 139, 3384-3390, (2009) · Zbl 1168.62022
[10] Escobar, MD, Estimating normal means with a Dirichlet process prior, J Am Stat Assoc, 89, 268-277, (1994) · Zbl 0791.62039
[11] Ferguson, TS, A Bayesian analysis of some nonparametric problems, Ann Stat, 1, 209-230, (1973) · Zbl 0255.62037
[12] Finegold, M.; Drton, M., Robust bayesian graphical modeling using dirichlet t-distributions, Bayesian Anal, 9, 521-550, (2014) · Zbl 1327.62143
[13] Fraley, C.; Raftery, AE, How many clusters? Which clustering method? Answers via model-based cluster analysis, Comput J, 41, 578-588, (1999) · Zbl 0920.68038
[14] Geman, S.; Geman, D., Stochastic relaxation, Gibbs distributions and the Bayesian restoration of images, IEEE Trans Pattern Anal Mach Intell, 6, 721-741, (1984) · Zbl 0573.62030
[15] Gilks, WR; Wild, P., Adaptive rejection sampling for Gibbs sampling, Appl Stat, 41, 337-348, (1992) · Zbl 0825.62407
[16] Hastings, WK, Monte Carlo sampling methods using Markov chains and their applications, Biometrika, 57, 97-109, (1970) · Zbl 0219.65008
[17] Heinzl, F.; Tutz, G., Clustering in linear mixed models with approximate Dirichlet process mixtures using EM algorithm, Stat Model, 13, 41-67, (2013)
[18] Heinzl, F.; Fahrmeir, L.; Kneib, T., Additive mixed models with Dirichlet process mixture and P-spline priors, Adv Stat Anal, 96, 47-68, (2012) · Zbl 1443.62098
[19] Ishwaran, H.; James, LF, Gibbs sampling methods for stick-breaking priors, J Am Stat Assoc, 96, 161-173, (2001) · Zbl 1014.62006
[20] Ishwaran H, James LF (2002) Approximate Dirichlet process computing in finite normal mixtures: smoothing and prior information. Comput Gr Stat 11:508-532
[21] Ismail MMB, Frigui H (2010) Possibilistic clustering based on robust modeling of finite generalized Dirichlet mixture. In: The 20th international conference on pattern recognition, pp 573-576
[22] Ismail, MMB; Frigui, H., Unsupervised clustering and feature weighting based on generalized Dirichlet mixture modeling, Inf Sci, 274, 35-54, (2014) · Zbl 1341.68163
[23] Laird, NM; Ware, JH, Random effects models for longitudinal data, Biometrics, 38, 963-974, (1982) · Zbl 0512.62107
[24] Li, Y.; Müller, P.; Lin, X., Center-adjusted inference for a nonparametric Bayesian random effect distribution, Stat Sinica, 21, 1201-1223, (2011) · Zbl 1223.62079
[25] Lin, TI, Learning from incomplete data via parameterized t mixture models through eigenvalue decomposition, Comput Stat Data Anal, 71, 183-195, (2014) · Zbl 1471.62120
[26] Lin, TI; Ho, HJ; Chen, CL, Analysis of multivariate skew normal models with incomplete data, J Multivar Anal, 100, 2337-2351, (2009) · Zbl 1175.62054
[27] Lin, TI; McNicholas, PD; Hsiu, JH, Capturing patterns via parsimonious t mixture models, Stat Probab Lett, 88, 80-87, (2014) · Zbl 1369.62131
[28] Lunn, D.; Spiegelhalter, D.; Thomas, A.; Best, N., The BUGS project: evolution, critique and future directions (with discussion), Stat Med, 28, 3049-3082, (2009)
[29] MacEachern, SN, Estimating normal means with a conjugate style Dirichlet process prior, Commun Stat, 23, 727-741, (1994) · Zbl 0825.62053
[30] McNicholas, PD; Giudici, P. (ed.); Ingrassia, S. (ed.); Vichi, M. (ed.), Model-based clustering and classification via mixtures of multivariate \(t\)-distributions, (2013), Heidelberg
[31] McNicholas, PD; Subedi, S., Clustering gene expression time course data using mixtures of multivariate \(t\)-distributions, J Stat Plan Inference, 142, 1114-1127, (2012) · Zbl 1236.62068
[32] Morris, K.; McNicholas, PD; Scrucca, L., Dimension reduction for model-based clustering via mixtures of multivariate \(t\)-distributions, Adv Data Anal Classif, 7, 321-338, (2013) · Zbl 1273.62141
[33] Munoz, A.; Carey, V.; Schouten, JP; Segal, M.; Rosner, B., A parametric family of correlation structures for the analysis of longitudinal data, Biometrics, 48, 733-742, (1992)
[34] Rasmussen, CE; Cruz, BJ; Ghahramani, Z.; Wild, DL, Modeling and visualizing uncertainty in gene expression clusters using Dirichlet process mixtures, IEEE/ACM Trans Comput Biol Bioinform, 6, 615-627, (2009)
[35] Schwarz, G., Estimating the dimension of a model, Ann Stat, 6, 461-464, (1978) · Zbl 0379.62005
[36] Sethuraman, J., A constructive definition of Dirichlet priors, Stat Sinica, 4, 639-650, (1994) · Zbl 0823.62007
[37] Sorensen D, Gianola D (2002) Likelihood, Bayesian and MCMC methods in quantitative genetics. Springer, New York · Zbl 1013.62105
[38] Steane, MA; McNicholas, PD; Yada, R., Model-based classification via mixtures of multivariate \(t\)-factor analyzers, Commun Stat Simul Comput, 41, 510-523, (2012) · Zbl 1294.62142
[39] Wakefield, JC; Zhou, C.; Self, SG; Bernardo, JM (ed.); Bayarri, MJ (ed.); Berger, JO (ed.); Dawid, AP (ed.); Heckerman, D. (ed.); Smith, AFM (ed.); West, M. (ed.), Modelling gene expression over time: curve clustering with informative prior distributions, No. 7, 721-732, (2003), Oxford
[40] Wang, WL, Multivariate t linear mixed models for irregularly observed multiple repeated measures with missing outcomes, Biometr J, 55, 554-571, (2013) · Zbl 1441.62525
[41] Wang, WL; Fan, TH, Estimation in multivariate t linear mixed models for multiple longitudinal data, Stat Sinica, 21, 1857-1880, (2011) · Zbl 1225.62130
[42] Wang, WL; Lin, TI, Multivariate t nonlinear mixed-effects models for multi-outcome longitudinal data with missing values, Stat Med, 33, 3029-3046, (2014)
[43] Wang, WL; Lin, TI, Robust model-based clustering via mixtures of skew-t distributions with missing information, Adv Data Anal Classif, 9, 423-445, (2015)
[44] Wang, L.; Wang, X., Hierarchical Dirichlet process model for gene expression clustering, EURASIP J Bioinform Syst Biol, 2013, 5, (2013)
This reference list is based on information provided by the publisher or from digital mathematics libraries. Its items are heuristically matched to zbMATH identifiers and may contain data conversion errors. It attempts to reflect the references listed in the original paper as accurately as possible without claiming the completeness or perfect precision of the matching.