×

Model-based clustering of time series in group-specific functional subspaces. (English) Zbl 1274.62416

Summary: This work develops a general procedure for clustering functional data which adapts the clustering method high dimensional data clustering (HDDC), originally proposed in the multivariate context. The resulting clustering method, called funHDDC, is based on a functional latent mixture model which fits the functional data in group-specific functional subspaces. By constraining model parameters within and between groups, a family of parsimonious models is exhibited which allow to fit onto various situations. An estimation procedure based on the EM algorithm is proposed for determining both the model parameters and the group-specific functional subspaces. Experiments on real-world datasets show that the proposed approach performs better or similarly than classical two-step clustering methods while providing useful interpretations of the groups and avoiding the uneasy choice of the discretization technique. In particular, funHDDC appears to always outperform HDDC applied on spline coefficients.

MSC:

62H30 Classification and discrimination; cluster analysis (statistical aspects)
62M10 Time series, auto-correlation, regression, etc. in statistics (GARCH)
62F99 Parametric inference

Software:

funHDDC; fda (R); AS 136
PDF BibTeX XML Cite
Full Text: DOI Link

References:

[1] Aguilera A, Escabiasa M, Preda C, Saporta G (2011) Using basis expansions for estimating functional PLS regression. Applications with chemometric data. Chemom Intell Lab Syst 104(2): 289–305
[2] Banfield J, Raftery A (1993) Model-based Gaussian and non-Gaussian clustering. Biometrics 49: 803–821 · Zbl 0794.62034
[3] Biernacki C (2004) Initializing EM using the properties of its trajectories in Gaussian mixtures. Stat Comput 14(3): 267–279
[4] Bouveyron C, Girard S, Schmid C (2007) High dimensional data clustering. Comput Stat Data Anal 52: 502–519 · Zbl 1452.62433
[5] Cattell R (1966) The scree test for the number of factors. Multivar Behav Res 1(2): 245–276
[6] Celeux G, Govaert G (1995) Gaussian parsimonious clustering models. J Pattern Recognit Soc 28: 781–793 · Zbl 05480211
[7] Delaigle A, Hall P (2010) Defining probability density for a distribution of random functions. Ann Stat 38: 1171–1193 · Zbl 1183.62061
[8] Dempster A, Laird N, Rubin D (1977) Maximum likelihood from incomplete data via the EM algorithm. J R Stat Soc 39(1): 1–38 · Zbl 0364.62022
[9] Escabias M, Aguilera A, Valderrama M (2005) Modeling environmental data by functional principal component logistic regression. Environmetrics 16: 95–107
[10] Ferraty F, Vieu P (2006) Nonparametric functional data analysis. Springer series in statistics. Springer, New York · Zbl 1271.62085
[11] Frühwirth-Schnatter S, Kaufmann S (2008) Model-based clustering of multiple time series. J Bus Econ Stat 26: 78–89
[12] Hartigan J, Wong M (1978) Algorithm as 1326: a k-means clustering algorithm. Appl Stat 28: 100–108 · Zbl 0447.62062
[13] Jacques J, Bouveyron C, Girard S, Devos O, Duponchel L, Ruckebusch C (2010) Gaussian mixture models for the classification of high-dimensional vibrational spectroscopy data. J Chemom 24: 719–727
[14] James G, Sugar C (2003) Clustering for sparsely sampled functional data. J Am Stat Assoc 98(462): 397–408 · Zbl 1041.62052
[15] Lévéder C, Abraham P, Cornillon E, Matzner-Lober E, Molinari N (2004) Discrimination de courbes de prétrissage. In: Chimiométrie 2004, Paris, pp 37–43
[16] Olszewski R (2001) Generalized feature extraction for structural pattern recognition in time-series data. PhD thesis, Carnegie Mellon University, Pittsburgh, PA
[17] Preda C, Saporta G, Lévéder C (2007) PLS classification of functional data. Comput Stat 22(2): 223–235 · Zbl 1196.62086
[18] Ramsay JO, Silverman BW (2005) Functional data analysis. Springer series in statistics, 2nd edn. Springer, New York
[19] Schwarz G (1978) Estimating the dimension of a model. Ann Stat 6: 461–464 · Zbl 0379.62005
[20] Tarpey T, Kinateder K (2003) Clustering functional data. J Classif 20(1): 93–114 · Zbl 1112.62327
[21] Tipping ME, Bishop C (1999) Mixtures of principal component analyzers. Neural Comput 11(2): 443–482
[22] Wahba G (1990) Spline models for observational data. SIAM, Philadelphia · Zbl 0813.62001
[23] Warren Liao T (2005) Clustering of time series data–a survey. Pattern Recognit 38: 1857–1874 · Zbl 1077.68803
[24] Xi X, Keogh E, Shelton C, Wei L, Ratanamahatana C (2006) Fast time series classification using numerosity reduction. In: 23rd international conference on machine learning (ICML 2006), Pittsburgh, PA, pp 1033–1040
This reference list is based on information provided by the publisher or from digital mathematics libraries. Its items are heuristically matched to zbMATH identifiers and may contain data conversion errors. It attempts to reflect the references listed in the original paper as accurately as possible without claiming the completeness or perfect precision of the matching.