×

Model-based clustering for longitudinal data. (English) Zbl 1452.62454

Summary: A model-based clustering method is proposed for clustering individuals on the basis of measurements taken over time. Data variability is taken into account through non-linear hierarchical models leading to a mixture of hierarchical models. We study both frequentist and Bayesian estimation procedures. From a classical viewpoint, we discuss maximum likelihood estimation of this family of models through the EM algorithm. From a Bayesian standpoint, we develop appropriate Markov chain Monte Carlo (MCMC) sampling schemes for the exploration of target posterior distribution of parameters. The methods are illustrated with the identification of hormone trajectories that are likely to lead to adverse pregnancy outcomes in a group of pregnant women.

MSC:

62H30 Classification and discrimination; cluster analysis (statistical aspects)
62F15 Bayesian inference
62P10 Applications of statistics to biology and medical sciences; meta analysis
62-08 Computational methods for problems pertaining to statistics

Software:

fda (R); MEMSS; mclust; S-PLUS; boa
PDF BibTeX XML Cite
Full Text: DOI

References:

[1] Booth, J.G., Casella, G., Hobert, J.P., 2007. Clustering using objective functions and stochastic search, submitted for publication. · Zbl 1400.62128
[2] Breiman, L.; Fridman, J.H.; Olshen, R.A.; Stone, C.J., Classification and regression trees, (1984), Wadsworth Belmont, CA · Zbl 0541.62042
[3] Celeux, G., Bayesian inference for mixtures: the label switching problem, (), 227-232 · Zbl 0951.62018
[4] Celeux, G.; Hurn, M.; Robert, C.P., Computational and inferential difficulties with mixture posterior distribution, J. amer. statist. assoc., 95, 957-970, (2000) · Zbl 0999.62020
[5] Celeux, G.; Lavergne, C.; Martin, O., Mixture of linear mixed models for clustering gene expression profiles from repeated microarray experiments, Statist. model., 5, 243-267, (2005) · Zbl 1111.62103
[6] Chib, S., Marginal likelihood from the Gibbs output, J. amer. statist. assoc., 90, 1313-1321, (1995) · Zbl 0868.62027
[7] Dasgupta, A.; Raftery, A.E., Detecting features in spatial point processes with clutter via model-based clustering, J. amer. statist. assoc., 93, 294-302, (1998) · Zbl 0906.62105
[8] Davidian, M.; Giltinan, D.M., Nonlinear models for repeated measurement data, (1995), Chapman & Hall London
[9] De la Cruz-Mesía, R.; Marshall, G., A Bayesian approach for nonlinear regression model with continuous errors, Comm. statist. theory methods, 32, 8, 1631-1646, (2003) · Zbl 1184.62042
[10] De la Cruz-Mesía, R.; Marshall, G., Nonlinear random effects models with continuous time autoregressive errors: a Bayesian approach, Statist. in medicine, 25, 9, 1471-1484, (2006)
[11] Dempster, A.E.; Laird, N.M.; Rubin, D.B., Maximum likelihood with incomplete data via the E-M algorithm, J. roy. statist. soc. ser. B, 39, 1-38, (1977) · Zbl 0364.62022
[12] DeSarbo, W.S.; Cron, W.L., A maximum likelihood methodology for clusterwise linear regression, J. classification, 5, 1, 249-282, (1988) · Zbl 0692.62052
[13] Diebolt, J.; Robert, C.P., Estimation of finite mixture distributions through Bayesian samplings, J. roy. statist. soc. ser. B, 56, 363-375, (1994) · Zbl 0796.62028
[14] Escobar, M.D.; West, M., Bayesian density estimation and inference using mixtures, J. amer. statist. assoc., 90, 577-588, (1995) · Zbl 0826.62021
[15] Fitzmaurice, G.M.; Laird, N.M.; Ware, J.H., Applied longitudinal analysis, (2004), Wiley New York · Zbl 1057.62052
[16] Fraley, C.; Raftery, A.E., How many clusters? which clustering method? answers via model-based cluster analysis, Comput. J., 41, 578-588, (1998) · Zbl 0920.68038
[17] Fraley, C.; Raftery, A.E., MCLUST: software for model-based cluster analysis, J. classification, 16, 297-306, (1999) · Zbl 0951.91500
[18] Fraley, C.; Raftery, A.E., Model-based clustering discriminant analysis and density estimation, J. amer. statist. assoc., 97, 611-631, (2002) · Zbl 1073.62545
[19] Frits, M.A.; Guo, S.M., Doubling time of human chorionic gonadotropin (hcg) in early normal pregnancy: relationship to hcg concentration and gestational age, Fertil. steril., 47, 584-589, (1987)
[20] Frühwirth-Schnatter, S., Markov chain Monte Carlo estimation of classical and dynamic switching and mixture models, J. amer. statist. assoc., 96, 194-209, (2001) · Zbl 1015.62022
[21] Gaffney, S.J., Smyth, P., 2003. Curve clustering with random effects regression mixtures. In: Bishop, C.M., Frey, B.J. (Eds.), Proceedings of the Ninth International Workshop on Artificial Intelligence and Statistics, Key West, FL.
[22] Gelman, A.; Roberts, G.O.; Gilks, W.R., Efficient metropolis jumping rules, (), 599-607
[23] Geweke, J., Evaluating the accuracy of sampling-based approaches to the calculation of posterior moments, (), 169-194
[24] Guo, S.W.; Thompson, E.A., Monte Carlo estimation of mixed models for large complex pedigrees, Biometrics, 50, 417-432, (1994) · Zbl 0821.62075
[25] Hartigan, J.A., Clustering algorithms, (1975), Wiley New York · Zbl 0372.62040
[26] Hosmer, D., Maximum likelihood estimates of the parameters of a mixture of two regression lines, Comm. statist., 3, 10, 995-1006, (1974) · Zbl 0294.62085
[27] Hosmer, D.W.; Lemeshow, S., Applied logistic regression, (2000), Wiley New York · Zbl 0967.62045
[28] Hurn, M.; Justel, A.; Robert, C.P., Estimating mixtures of regressions, J. comput. and graphical statist., 12, 1, 55-79, (2003)
[29] James, G.; Sugar, C., Clustering for sparsely sampled functional data, J. amer. statist. assoc., 98, 397-408, (2003) · Zbl 1041.62052
[30] Jasra, A.; Holmes, C.C.; Stephens, D.A., Markov chain Monte Carlo and the label switching problem in Bayesian mixture modelling, Statist. sci., 20, 1, 50-67, (2005) · Zbl 1100.62032
[31] Jennrich, R.I.; Schluchter, M.D., Unbalanced repeated-measures models with structured covariance matrices, Biometrics, 42, 4, 805-820, (1986) · Zbl 0625.62052
[32] Jones, P.N.; McLachlan, G.J., Fitting finite mixture models in a regression context, Austral. J. statist., 34, 2, 233-240, (1992)
[33] Kass, R.E.; Raftery, A.E., Bayes factors, J. amer. statist. assoc., 90, 773-795, (1995) · Zbl 0846.62028
[34] Li, B., A new approach to cluster analysis: the clustering-function-based method, J. roy. statist. soc. ser. B, 68, 457-476, (2006) · Zbl 1100.62068
[35] Marshall, G.; Barón, A.E., Linear discriminant models for unbalanced longitudinal data, Statist. in medicine, 19, 1969-1981, (2000)
[36] McLachlan, G.J.; Basford, K.E., Mixture models: inference and applications to clustering, (1988), Marcel Dekker New York · Zbl 0697.62050
[37] McLachlan, G.J.; Peel, D., Finite mixture models, (2000), Wiley New York · Zbl 0963.62061
[38] Pauler, D.K.; Laird, N.M., A mixture model for longitudinal data with application to assessment of noncompliance, Biometrics, 56, 464-472, (2000) · Zbl 1069.62558
[39] Peng, F.; Jacobs, R.A.; Tanner, M.A., Bayesian inference in mixtures-of-experts and hierarchical mixtures-of-experts with an application to speech recognition, J. amer. statist. assoc., 91, 953-960, (1996) · Zbl 0882.62022
[40] Pfeifer, C., Classification of longitudinal profiles based on semi-parametric regression with mixed effects, Statist. model., 4, 314-323, (2004) · Zbl 1061.62200
[41] Pinheiro, J.C.; Bates, D.M., Mixed-effects models in S and S-PLUS, (2000), Springer New York · Zbl 0953.62065
[42] Quandt, R.E., A new approach to estimating switching regressions, J. amer. statist. assoc., 57, 306-310, (1972) · Zbl 0237.62047
[43] Quandt, R.E.; Ramsey, J.B., Estimating mixtures of normal distributions and switching regressions, J. amer. statist. assoc., 73, 730-738, (1978) · Zbl 0401.62024
[44] Ramsay, J.O.; Silverman, B.W., Functional data analysis, (1997), Springer New York · Zbl 0882.62002
[45] Richardson, S.; Green, P.J., On Bayesian analysis of mixture models with an unknown number of components, J. roy statist. soc. ser. B, 59, 4, 731-792, (1997) · Zbl 0891.62020
[46] Roeder, K.; Wasserman, L., Practical Bayesian density estimation using mixtures of normals, J. amer. statist. assoc., 92, 894-902, (1997) · Zbl 0889.62021
[47] Schwarz, G., Estimating the dimension of a model, Ann. statist., 6, 461-464, (1978) · Zbl 0379.62005
[48] Segal, M.R., Tree-structured methods for longitudinal data, J. amer. statist. assoc., 87, 407-418, (1992)
[49] Smith, B.J., 2004. Bayesian Output Analysis Program (BOA). Version 1.1.2 for S-PLUS and R. Available at: \(\langle\)http://www.public-health.uiowa.edu/boa⟩.
[50] Stephens, M., Dealing with label switching in mixture models, J. roy. statist. soc. ser. B, 62, 795-809, (2000) · Zbl 0957.62020
[51] Verbeke, G.; Molenberghs, G., Linear mixed models for longitudinal data, (2000), Springer New York · Zbl 0956.62055
[52] Viele, K.; Tong, B., Modeling with mixtures of linear regressions, Ann. statist., 27, 439-460, (2002)
[53] Vonesh, E.F.; Chinchilli, V.M., Linear and nonlinear models for the analysis of repeated measurements, (1997), Marcel Dekker New York · Zbl 0893.62077
[54] Wu, C.F.J., On the convergence properties of the EM algorithm, Ann. statist., 11, 1, 95-103, (1983) · Zbl 0517.62035
[55] Zhang, H., Multivariate adaptive splines for analysis of longitudinal data, J. comput. graphical statist., 6, 74-91, (1997)
This reference list is based on information provided by the publisher or from digital mathematics libraries. Its items are heuristically matched to zbMATH identifiers and may contain data conversion errors. It attempts to reflect the references listed in the original paper as accurately as possible without claiming the completeness or perfect precision of the matching.