×

Panel data analysis: a survey on model-based clustering of time series. (English) Zbl 1274.62591

Summary: Clustering is a widely used statistical tool to determine subsets in a given data set. Frequently used clustering methods are mostly based on distance measures and cannot easily be extended to cluster time series within a panel or a longitudinal data set. The paper reviews recently suggested approaches to model-based clustering of panel or longitudinal data based on finite mixture models. Several approaches are considered that are suitable both for continuous and for categorical time series observations. Bayesian estimation through Markov chain Monte Carlo methods is described in detail and various criteria to select the number of clusters are reviewed. An application to a panel of marijuana use among teenagers serves as an illustration.

MSC:

62M10 Time series, auto-correlation, regression, etc. in statistics (GARCH)
62H30 Classification and discrimination; cluster analysis (statistical aspects)
62-02 Research exposition (monographs, survey articles) pertaining to statistics
68T10 Pattern recognition, speech recognition
91C20 Clustering in the social and behavioral sciences

Software:

bayesm; funHDDC
PDF BibTeX XML Cite
Full Text: DOI

References:

[1] Agresti A (1990) Categorical data analysis. Wiley, Chichester · Zbl 0716.62001
[2] Akaike H (1974) A new look at statistical model identification. IEEE Trans Autom Control 19: 716–723 · Zbl 0314.62039
[3] Aßmann C, Boysen-Hogrefe J (2011) A Bayesian approach to model-based clustering for binary panel probit models. Comput Stat Data Anal 55: 261–279 · Zbl 1247.62152
[4] Baştürk N, Paap R, van Dijk D (2011) Structural differences in economic growth: An endogenous clustering approach. Appl Econ XX, forthcoming
[5] Banfield JD, Raftery AE (1993) Model-based Gaussian and non-Gaussian clustering. Biometrics 49: 803–821 · Zbl 0794.62034
[6] Bauwens L, Rambouts JVK (2007) Bayesian clustering of many GARCH models. Econom Rev 26: 365–386 · Zbl 1112.62016
[7] Biernacki C, Celeux G, Govaert G (2010) Exact and Monte Carlo calculations of integrated likelihoods for the latent class model. J Stat Plan Inference 140: 2991–3002 · Zbl 1203.62027
[8] Biernacki C, Govaert G (1997) Using the classification likelihood to choose the number of clusters. Comput Sci Stat 29: 451–457
[9] Binder DA (1978) Bayesian cluster analysis. Biometrika 65: 31–38 · Zbl 0376.62007
[10] Bouveyron C, Jacques J (2011) Model-based clustering of time series in group-specific functional subspaces. Adv Data Anal Classif. this issue, doi: 10.1007/s11634-011-0095-6 · Zbl 1274.62416
[11] Cadez I, Heckerman D, Meek C, Smyth P, White S (2003) Model-based clustering and visualization of navigation patterns on a web site. Data Min Knowl Discov 7(4): 399–424 · Zbl 05660824
[12] Canova F (2004) Testing for convergence clubs in income per-capita: a predictive density approach. Int Econ Rev 45: 49–77
[13] Celeux G, Forbes F, Robert CP, Titterington DM (2006) Deviance information criteria for missing data models. Bayesian Anal 1: 651–674 · Zbl 1331.62329
[14] Diebolt J, Robert CP (1994) Estimation of finite mixture distributions through Bayesian sampling. J Royal Stat Soc Ser B 56: 363–375 · Zbl 0796.62028
[15] Diggle PJ, Heagerty P, Liang K-Y, Zeger SL (2002) Analysis of longitudinal data, 2nd edn. Oxford University Press, Oxford
[16] Everitt BS (1979) Unresolved problems in cluster analysis. Biometrics 35: 169–181 · Zbl 0406.62042
[17] Everitt BS, Landau S, Leese M (2001) Cluster analysis, 4th edn. Edward Arnold, London · Zbl 1205.62076
[18] Fougère D, Kamionka T (2003) Bayesian inference of the mover-stayer model in continuous-time with an application to labour market transition data. J Appl Econom 18: 697–723
[19] Fraley C, Raftery AE (2002) Model-based clustering, discriminant analysis, and density estimation. J Am Stat Assoc 97: 611–631 · Zbl 1073.62545
[20] Frühwirth-Schnatter S (2004) Estimating marginal likelihoods for mixture and Markov switching models using bridge sampling techniques. Econom J 7: 143–167 · Zbl 1053.62087
[21] Frühwirth-Schnatter S (2006) Finite mixture and Markov switching models. Springer, New York · Zbl 1108.62002
[22] Frühwirth-Schnatter S (2011) Dealing with label switching under model uncertainty. In: Mengersen K, Robert CP, Titterington D (eds) Mixture estimation and applications, Chapter 10. Wiley, Chichester, pp 193–218
[23] Frühwirth-Schnatter S, Frühwirth R (2010) Data augmentation and MCMC for binary and multinomial logit models. In: Kneib T, Tutz G (eds) Statistical modelling and regression structures–Festschrift in Honour of Ludwig Fahrmeir. Physica, Heidelberg, pp 111–132
[24] Frühwirth-Schnatter S, Kaufmann S (2006) How do changes in monetary policy affect bank lending? An analysis of Austrian bank data. J Appl Econom 21: 275–305
[25] Frühwirth-Schnatter S, Kaufmann S (2008) Model-based clustering of multiple time series. J Bus Econ Stat 26: 78–89
[26] Frühwirth-Schnatter S, Pamminger C, Weber A, Winter-Ebmer R (2011) Labor market entry and earnings dynamics: Bayesian inference using mixtures-of-experts Markov chain clustering. J Appl Econom 26, forthcoming
[27] Frühwirth-Schnatter S, Pyne S (2010) Bayesian inference for finite mixtures of univariate and multivariate skew normal and skew-t distributions. Biostatistics 11: 317–336
[28] Frühwirth-Schnatter S, Tüchler R, Otter T (2004) Bayesian analysis of the heterogeneity model. J Bus Econ Stat 22: 2–15
[29] Frydman H (2005) Estimation in the mixture of Markov chains moving with different speeds. J Am Stat Assoc 100: 1046–1053 · Zbl 1117.62337
[30] Gamerman D, Lopes HF (2006) Markov chain Monte Carlo. Stochastic simulation for Bayesian inference, 2nd edn. Chapman & Hall/CRC, Boca Raton · Zbl 1137.62011
[31] García-Escudero L, Gordaliza A, Matrán C, Mayo-Iscar A (2010) A review of robust clustering methods. Adv Data Anal Classif 4: 89–109 · Zbl 1284.62375
[32] Greene W, Hensher D (2003) A latent class model for discrete choice analysis: contrasts with mixed logit. Transp Res Part B 37: 681–698
[33] Grün B, Leisch F (2008) Identifiability of finite mixtures of multinomial logit models with varying and fixed effects. J Classif 25: 225–247 · Zbl 1276.62021
[34] Heard NA, Holmes CC, Stephens DA (2006) A quantative study of gene regulation involved in the immune response of anopheline mosquitoes: An application of Bayesian hierarchical clustering of curves. J Am Stat Assoc 101: 18–29 · Zbl 1118.62368
[35] Hsiao C (2003) Analysis of panel data, 2nd edn. Cambridge University Press, Cambridge · Zbl 0608.62145
[36] Juárez MA, Steel MFJ (2010) Model-based clustering of non-Gaussian panel data based on skew-t distributions. J Bus Econ Stat 28: 52–66 · Zbl 1198.62097
[37] Kass RE, Raftery AE (1995) Bayes factors. J Am Stat Assoc 90: 773–795 · Zbl 0846.62028
[38] Keribin C (2000) Consistent estimation of the order of mixture models. Sankhyā A62: 49–66 · Zbl 1081.62516
[39] Kiefer NM, Wolfowitz J (1956) Consistency of the maximum likelihood estimator in the presence of infinitely many incidental parameters. Ann Math Stat 27: 887–906 · Zbl 0073.14701
[40] Lang JB, McDonald JW, Smith PWF (1999) Association-marginal modelling of multivariate categorical responses: A maximim likelihood approach. J Am Stat Assoc 94: 1161–1171 · Zbl 1072.62590
[41] Lenk PJ, DeSarbo WS (2000) Bayesian inference for finite mixtures of generalized linear models with random effects. Psychometrika 65: 93–119 · Zbl 1291.62225
[42] Liao TW (2005) Clustering of time series data–a survey. Pattern Recogn 38: 1857–1874 · Zbl 1077.68803
[43] Luan Y, Li H (2003) Clustering of time-course gene expression data using a mixed-effects models with B-splines. Bioinformatics 19: 474–482
[44] McLachlan GJ, Peel D (2000) Finite mixture models. Wiley series in probability and statistics. Wiley, New York
[45] McNicholas PD, Murphy TB (2010) Model-based clustering of longitudinal data. Can J Stat 38: 153–168 · Zbl 1190.62120
[46] Nobile A (2004) On the posterior distribution of the number of components in a finite mixture. Ann Stat 32: 2044–2073 · Zbl 1056.62037
[47] Owen AL, Videras J, Davis L (2009) Do all countries follow the same growth process?. J Econ Growth 14: 265–286 · Zbl 05659536
[48] Pamminger C, Frühwirth-Schnatter S (2010) Model-based clustering of categorical time series. Bayesian Anal 5: 345–368 · Zbl 1330.62256
[49] Peng F, Jacobs RA, Tanner MA (1996) Bayesian inference in mixtures-of-experts and hierarchical mixtures-of-experts models with an application to speech recognition. J Am Stat Assoc 91: 953–960 · Zbl 0882.62022
[50] Ramoni M, Sebastiani P, Cohen P (2002) Bayesian clustering by dynamics. Mach Learn 47: 91–121 · Zbl 1012.68154
[51] Ramoni M, Sebastiani P, Kohane P (2002) Clustering analysis of gene expression dynamics. Proc Natl Acad Sci 99: 9121–9126 · Zbl 1023.62110
[52] Rossi PE, Allenby GM, McCulloch R (2005) Bayesian statistics and marketing. Wiley, Chichester · Zbl 1094.62037
[53] Rousseau J, Mengersen K (2010) Asymptotic behaviour of the posterior distribution in overfitted mixture models. Technical report, ENSEA-CREST · Zbl 1228.62034
[54] Saul LK, Jordan MI (1999) Mixed memory Markov models: Decomposing complex stochastic processes as mixture of simpler ones. Mach Learn 37: 75–87 · Zbl 0948.68096
[55] Schwarz G (1978) Estimating the dimension of a model. Ann Stat 6: 461–464 · Zbl 0379.62005
[56] Scott AJ, Symons M (1971) Clustering methods based on likelihood ratio criteria. Biometrics 27: 387–397
[57] Sperrin M, Jaki T, Wit E (2010) Probabilistic relabelling strategies for the label switching problem in Bayesian mixture models. Stat Comput 20: 357–366
[58] Spiegelhalter DJ, Best NG, Carlin BP, van der Linde A (2002) Bayesian measures of model complexity and fit. J Royal Stat Soc Ser B 64: 583–639 · Zbl 1067.62010
[59] van Vermunt JK (2010) Longitudinal research using mixture models. In: Montfort K, Oud JHL, Satorra A (eds) Longitudinal research with latent variables, Chapter 4. Springer, Heidelberg, pp 119–152
[60] Wooldridge JM (2005) Simple solutions to the initial conditions problem in dynamic, nonlinear panel data models with unobserved heterogeneity. J Appl Econ 20: 39–54
[61] Zhu H-T, Zhang H (2004) Hypothesis testing in mixture regression models. J Royal Stat Soc Ser B 66: 3–16 · Zbl 1062.62033
This reference list is based on information provided by the publisher or from digital mathematics libraries. Its items are heuristically matched to zbMATH identifiers and may contain data conversion errors. It attempts to reflect the references listed in the original paper as accurately as possible without claiming the completeness or perfect precision of the matching.