×

Co-clustering of time-dependent data via the shape invariant model. (English) Zbl 07473953

Summary: Multivariate time-dependent data, where multiple features are observed over time for a set of individuals, are increasingly widespread in many application domains. To model these data, we need to account for relations among both time instants and variables and, at the same time, for subject heterogeneity. We propose a new co-clustering methodology for grouping individuals and variables simultaneously, designed to handle both functional and longitudinal data. Our approach borrows some concepts from the curve registration framework by embedding the shape invariant model in the latent block model, estimated via a suitable modification of the SEM-Gibbs algorithm. The resulting procedure allows for several user-defined specifications of the notion of cluster that can be chosen on substantive grounds and provides parsimonious summaries of complex time-dependent data by partitioning data matrices into homogeneous blocks. Along with the explicit modelling of time evolution, these aspects allow for an easy interpretation of the clusters, from which also low-dimensional settings may benefit.

MSC:

62H30 Classification and discrimination; cluster analysis (statistical aspects)
PDF BibTeX XML Cite
Full Text: DOI arXiv

References:

[1] Anderlucci, L.; Viroli, C., Covariance pattern mixture models for the analysis of multivariate heterogeneous longitudinal data, The Annals of Applied Statistics, 9, 2, 777-800 (2015) · Zbl 1397.62214
[2] Ben Slimen, YS; Allio, S.; Jacques, J., Model-based co-clustering for functional data, Neurocomputing, 291, 97-108 (2018)
[3] Biernacki, C.; Celeux, G.; Govaert, G., Assessing a mixture model for clustering with the integrated completed likelihood, IEEE Transactions on Pattern Analysis and Machine Intelligence, 22, 7, 719-725 (2000)
[4] Bouveyron, C.; Jacques, J., Model-based clustering of time series in group-specific functional subspaces, Advances in Data Analysis and Classification, 5, 4, 281-300 (2011) · Zbl 1274.62416
[5] Bouveyron, C.; Côme, E.; Jacques, J., The discriminative functional mixture model for a comparative analysis of bike sharing systems, The Annals of Applied Statistics, 9, 4, 1726-1760 (2015) · Zbl 1397.62511
[6] Bouveyron, C.; Bozzi, L.; Jacques, J.; Jollois, FX, The functional latent block model for the co-clustering of electricity consumption curves, Journal of the Royal Statistical Society: Series C (Applied Statistics), 67, 4, 897-915 (2018)
[7] Bouveyron, C.; Celeux, G.; Murphy, TB; Raftery, AE, Model-based clustering and classification for data science: With applications in R (2019), Cambridge: Cambridge University Press, Cambridge · Zbl 1436.62006
[8] Bouveyron, C., Jacques, J., Schmutz, A., Simoes, F., & Bottini, S. (2020). Co-clustering of multivariate functional data for the analysis of air pollution in the south of France. HAL preprint hal-02862177.
[9] Brauner, J.M., Mindermann, S., Sharma, M., Johnston, D., Salvatier, J., Gavenčiak, T., Stephenson, A.B., Leech, G., Altman, G., Mikulik, V., & et al. (2021). Inferring the effectiveness of government interventions against COVID-19. Science, 371(6531).
[10] Corneli, M., & Erosheva, E. (2020). A Bayesian approach for clustering and exact finite-sample model selection in longitudinal data mixtures. HAL preprint hal-02310069v2.
[11] Corneli, M.; Bouveyron, C.; Latouche, P., Co-clustering of ordinal data via latent continuous random variables and not missing at random entries, Journal of Computational and Graphical Statistics, 29, 4, 771-785 (2020)
[12] De Boor, C., A practical guide to splines (1978), New York: Springer-Verlag, New York · Zbl 0406.41003
[13] De la Cruz-Mesía, R.; Quintana, FA; Marshall, G., Model-based clustering for longitudinal data, Computational Statistics & Data Analysis, 52, 3, 1441-1457 (2008) · Zbl 1452.62454
[14] Delattre, M.; Lavielle, M.; Poursat, M., A note on BIC in mixed-effects models, Electronic Journal of Statistics, 8, 1, 456-475 (2014) · Zbl 1348.62186
[15] Dempster, AP; Laird, NM; Rubin, DB, Maximum likelihood from incomplete data via the em algorithm, Journal of the Royal Statistical Society: Series B (Methodological), 39, 1, 1-22 (1977) · Zbl 0364.62022
[16] Diggle, PJ; Heagerty, P.; Liang, KY; Heagerty, PJ; Zeger, S., Analysis of longitudinal data (2002), Oxford: Oxford University Press, Oxford · Zbl 1031.62002
[17] Erosheva, E.; Matsueda, RL; Telesca, D., Breaking bad: Two decades of life-course data analysis in criminology, developmental psychology, and beyond, Annual Review of Statistics and Its Application, 1, 301-332 (2014)
[18] Flaxman, S.; Mishra, S.; Gandy, A.; Unwin, HJT; Mellan, TA; Coupland, H.; Whittaker, C.; Zhu, H.; Berah, T.; Eaton, JW, Estimating the effects of non-pharmaceutical interventions on COVID-19 in europe, Nature, 584, 7820, 257-261 (2020)
[19] Fraley, C.; Raftery, AE, Model-based clustering, discriminant analysis, and density estimation, Journal of the American statistical Association, 97, 458, 611-631 (2002) · Zbl 1073.62545
[20] Frühwirth-Schnatter, S., Panel data analysis: A survey on model-based clustering of time series, Advances in Data Analysis and Classification, 5, 4, 251-280 (2011) · Zbl 1274.62591
[21] Govaert, G.; Nadif, M., Clustering with block mixture models, Pattern Recognition, 36, 2, 463-473 (2003) · Zbl 1452.62444
[22] Govaert, G.; models, MNadif, Block clustering with bernoulli mixture comparison of different approaches, Computational Statistics & Data Analysis, 52, 6, 3233-3245 (2008) · Zbl 1452.62444
[23] Govaert, G.; Nadif, M., Latent block model for contingency table, Communications in Statistics - Theory and Methods, 39, 3, 416-425 (2010) · Zbl 1187.62117
[24] Govaert, G., & Nadif, M. (2013). Co-clustering: Models, algorithms and applications, Wiley, New York. · Zbl 1416.62309
[25] Hale, T., Angrist, N., Cameron-Blake, E., Hallas, L., Kira, B., Majumdar, S., Petherick, T., Phillips, A., Tatlow, H., & Webster, S. (2020). Oxford COVID-19 Government Response Tracker, Blavatnik School of Government. https://www.bsg.ox.ac.uk/research/research-projects/coronavirus-government-response-trackerhttps://www.bsg.ox.ac.uk/research/research-projects/coronavirus-government- https://www.bsg.ox.ac.uk/research/research-projects/coronavirus-government-response-trackerresponse-tracker.
[26] Harring, JR; Liu, J., A comparison of estimation methods for nonlinear mixed-effects models under model misspecification and data sparseness: A simulation study, Journal of Modern Applied Statistical Methods, 15, 1, 27 (2016)
[27] Hubert, L.; Arabie, P., Comparing partitions, Journal of Classification, 2, 1, 193-218 (1985) · Zbl 0587.62128
[28] Jacques, J.; Biernacki, C., Model-based co-clustering for ordinal data, Computational Statistics & Data Analysis, 123, 101-115 (2018) · Zbl 1469.62086
[29] Jacques, J.; Preda, C., Functional data clustering: A survey, Advances in Data Analysis and Classification, 8, 3, 231-255 (2014) · Zbl 1414.62018
[30] James, GM; Sugar, CA, Clustering for sparsely sampled functional data, Journal of the American Statistical Association, 98, 462, 397-408 (2003) · Zbl 1041.62052
[31] Keribin, C.; Brault, V.; Celeux, G.; Govaert, G., Estimation and selection for the latent block model on categorical data, Statistics and Computing, 25, 6, 1201-1216 (2015) · Zbl 1331.62149
[32] Keribin, C., Celeux, G., & Robert, V. (2017). The latent block model: A useful model for high dimensional data. HAL preprint hal-01658589.
[33] Kneip, A.; Gasser, T., Convergence and consistency results for self-modeling nonlinear regression, The Annals of Statistics, 16, 1, 82-112 (1988) · Zbl 0725.62060
[34] Lawton, WH; Sylvestre, EA; Maggio, MS, Self modeling nonlinear regression, Technometrics, 14, 3, 513-532 (1972) · Zbl 0239.62045
[35] Liao, TW, Clustering of time series data - A survey, Pattern Recognition, 38, 11, 1857-1874 (2005) · Zbl 1077.68803
[36] Lindstrom, MJ, Self-modelling with random shift and scale parameters and a free-knot spline shape function, Statistics in Medicine, 14, 18, 2009-2021 (1995)
[37] Lindstrom, MJ; Bates, D., Nonlinear mixed effects models for repeated measures data, Biometrics, 46, 3, 673-687 (1990)
[38] Lomet, A. (2012). Sélection de modèle pour la classification croisée de données continues. PhD thesis, Compiègne.
[39] McNicholas, PD; Murphy, TB, Model-based clustering of longitudinal data, Canadian Journal of Statistics, 38, 1, 153-168 (2010) · Zbl 1190.62120
[40] Nagin, D., Group-based modeling of development (2009), Cambridge: Harvard University Press, Cambridge
[41] Pinheiro, J.; Bates, D., Approximations to the log-likelihood function in the nonlinear mixed-effects model, Journal of computational and Graphical Statistics, 4, 1, 12-35 (1995)
[42] Pinheiro, J.; Bates, D., Mixed-effects models in S and s-PLUS (2006), Berlin: Springer Science & Business Media, Berlin · Zbl 0953.62065
[43] Pinheiro, J., Bates, D., DebRoy, S., Sarkar, D., & R Core Team. (2019). nlme: Linear and nonlinear mixed effects models. https://CRAN.R-project.org/package=nlme. R package version 3.1-139.
[44] R Core Team. (2019). R: A language and environment for statistical computing. R Foundation for Statistical Computing, Vienna, Austria. https://www.R-project.org/.
[45] Ramsay, JO; Li, X., Curve registration, Journal of the Royal Statistical Society: Series B (Methodological), 60, 2, 351-363 (1998) · Zbl 0909.62033
[46] Ramsay, JO; Silverman, BW, Functional data analysis (2005), New York: Springer, New York · Zbl 1079.62006
[47] Rice, JA, Functional and longitudinal data analysis: perspectives on smoothing, Statistica Sinica, 14, 3, 631-647 (2004) · Zbl 1073.62033
[48] Robert, V.; Vasseur, Y.; Brault, V., Comparing high-dimensional partitions with the co-clustering adjusted rand index, Journal of Classification, 38, 158-186 (2021) · Zbl 07370657
[49] Selosse, M.; Jacques, J.; Biernacki, C., Model-based co-clustering for mixed type data, Computational Statistics & Data Analysis, 144, 106866 (2020) · Zbl 07160684
[50] Telesca, D.; Inoue, LYT, Bayesian hierarchical curve registration, Journal of the American Statistical Association, 103, 481, 328-339 (2008) · Zbl 1471.62560
[51] Telesca, D.; Erosheva, E.; Kreager, DA; Matsueda, RL, Modeling criminal careers as departures from a unimodal population age-crime curve: The case of marijuana use, Journal of the American Statistical Association, 107, 500, 1427-1440 (2012) · Zbl 1258.62127
[52] van Dijk, B., van Rosmalen, J., & Paap, R. (2009). A Bayesian approach to two-mode clustering. In Technical report, econometric institute report erasmus university rotterdam.
[53] Viroli, C., Finite mixtures of matrix normal distributions for classifying three-way data, Statistics and Computing, 21, 4, 511-522 (2011) · Zbl 1221.62083
[54] Viroli, C., Model based clustering for three-way data structures, Bayesian Analysis, 6, 4, 573-602 (2011) · Zbl 1330.62262
[55] Wyse, J.; Friel, N., Block clustering with collapsed latent block models, Statistics and Computing, 22, 2, 415-428 (2012) · Zbl 1322.62046
[56] Wyse, J.; Friel, N.; Latouche, P., Inferring structure in bipartite networks using the latent blockmodel and exact ICL, Network Science, 5, 1, 45-69 (2017)
This reference list is based on information provided by the publisher or from digital mathematics libraries. Its items are heuristically matched to zbMATH identifiers and may contain data conversion errors. It attempts to reflect the references listed in the original paper as accurately as possible without claiming the completeness or perfect precision of the matching.