##
**Model-based clustering and segmentation of time series with changes in regime.**
*(English)*
Zbl 1274.62427

Summary: Mixture model-based clustering, usually applied to multidimensional data, has become a popular approach in many data analysis problems, both for its good statistical properties and for the simplicity of implementation of the expectation-maximization (EM) algorithm. Within the context of a railway application, this paper introduces a novel mixture model for dealing with time series that are subject to changes in regime. The proposed approach, called ClustSeg, consists in modeling each cluster by a regression model in which the polynomial coefficients vary according to a discrete hidden process. In particular, this approach makes use of logistic functions to model the (smooth or abrupt) transitions between regimes. The model parameters are estimated by the maximum likelihood method solved by an EM algorithm. This approach can also be regarded as a clustering approach which operates by finding groups of time series having common changes in regime. In addition to providing a time series partition, it therefore provides a time series segmentation. The problem of selecting the optimal numbers of clusters and segments is solved by means of the Bayesian information criterion. The ClustSeg approach is shown to be efficient using a variety of simulated time series and real-world time series of electrical power consumption from rail switching operations.

### MSC:

62H30 | Classification and discrimination; cluster analysis (statistical aspects) |

62M10 | Time series, auto-correlation, regression, etc. in statistics (GARCH) |

62F10 | Point estimation |

62B15 | Theory of statistical experiments |

62P30 | Applications of statistics in engineering and industry; control charts |

### Keywords:

clustering; time series; change in regime; mixture model; regression mixture; hidden logistic process; EM algorithm
PDF
BibTeX
XML
Cite

\textit{A. Samé} et al., Adv. Data Anal. Classif., ADAC 5, No. 4, 301--321 (2011; Zbl 1274.62427)

### References:

[1] | Banfield JD, Raftery AE (1993) Model-based gaussian and non-gaussian clustering. Biometrics 49: 803–821 · Zbl 0794.62034 |

[2] | Biernacki C, Celeux G, Govaert G (2000) Assessing a mixture model for clustering with the integrated completed likelihood. IEEE Trans Pattern Anal Mach Intell 22(7): 719–725 |

[3] | Celeux G, Govaert G (1995) Gaussian parsimonious clustering models. Pattern Recogn. 28(5): 781–793 · Zbl 05480211 |

[4] | Chamroukhi F, Samé A, Govaert G, Aknin P (2010) A hidden process regression model for functional data description. application to curve discrimination. Neurocomputing 73: 1210–1221 · Zbl 05721319 |

[5] | Chiou J, Li P (2007) Functional clustering and identifying substructures of longitudinal data. J Royal Stat Soc Ser B (Stat Methodol) 69(4): 679–699 |

[6] | Coke G, Tsao M (2010) Random effects mixture models for clustering electrical load series. J Time Ser Anal 31(6): 451–464 · Zbl 1226.91048 |

[7] | Dempster AP, Laird NM, Rubin DB (1977) Maximum likelihood from incomplete data via the em algorithm (with discussion). J Royal Stat Soc B 39: 1–38 · Zbl 0364.62022 |

[8] | Gaffney S, Smyth P (1999) Trajectory clustering with mixtures of regression models. In: Proceedings of the fifth ACM SIGKDD international conference on knowledge discovery and data mining. ACM Press, San Diego, CA, USA |

[9] | Gaffney S, Smyth P (2003) Curve clustering with random effects regression mixtures. In: Proceedings of the ninth international workshop on artificial intelligence and statistics, society for artificial intelligence and statistics, Key West, Florida, USA |

[10] | Green P (1984) Iteratively reweighted least squares for maximum likelihood estimation, and some robust and resistant alternatives. J Royal Stat Soc B 46(2): 149–192 · Zbl 0555.62028 |

[11] | Hébrail G, Hugueney B, Lechevallier Y, Rossi F (2010) Exploratory analysis of functional data via clustering and optimal segmentation. Neurocomputing 73(7–9): 1125–1141 · Zbl 05721313 |

[12] | James G, Sugar C (2003) Clustering for sparsely sampled functional data. J Am Stat Assoc 98(462): 397–408 · Zbl 1041.62052 |

[13] | Liu X, Yang M (2009) Simultaneous curve registration and clustering for functional data. Comput Stat Data Anal 53(4): 1361–1376 · Zbl 1452.62993 |

[14] | McLachlan GJ, Krishnan K (2008) The EM algorithm and extension, 2nd edn. Wiley, New York · Zbl 1165.62019 |

[15] | McLachlan GJ, Peel D (2000) Finite mixture models. Wiley, New York · Zbl 0963.62061 |

[16] | Ng S, McLachlan G, Wang K, Ben-Tovim Jones L, Ng S (2006) A mixture model with random-effects components for clustering correlated gene-expression profiles. Bioinformatics 22(14): 1745 |

[17] | Ramsay JO, Silverman BW (1997) Fuctional data analysis. Springer Series in Statistics, Springer, New York |

[18] | Schwarz G (1978) Estimating the number of components in a finite mixture model. Ann Stat 6: 461–464 · Zbl 0379.62005 |

[19] | Shi J, Wang B (2008) Curve prediction and clustering with mixtures of gaussian process functional regression models. Stat Comput 18(3): 267–283 |

[20] | Wong C, Li W (2000) On a mixture autoregressive model. J Royal Stat Soc Ser B Stat Methodol 62(1): 95–115 · Zbl 0941.62095 |

[21] | Xiong Y, Yeung D (2004) Time series clustering with arma mixtures. Pattern Recogn 37(8): 1675–1689 · Zbl 1117.62488 |

This reference list is based on information provided by the publisher or from digital mathematics libraries. Its items are heuristically matched to zbMATH identifiers and may contain data conversion errors. It attempts to reflect the references listed in the original paper as accurately as possible without claiming the completeness or perfect precision of the matching.