×

Clustering time series by linear dependency. (English) Zbl 1430.62191

Summary: We present a new way to find clusters in large vectors of time series by using a measure of similarity between two time series, the generalized cross correlation. This measure compares the determinant of the correlation matrix until some lag \(k\) of the bivariate vector with those of the two univariate time series. A matrix of similarities among the series based on this measure is used as input of a clustering algorithm. The procedure is automatic, can be applied to large data sets and it is useful to find groups in dynamic factor models. The cluster method is illustrated with some Monte Carlo experiments and a real data example.

MSC:

62M10 Time series, auto-correlation, regression, etc. in statistics (GARCH)
62H30 Classification and discrimination; cluster analysis (statistical aspects)
65C05 Monte Carlo methods

Software:

Silhouettes; Tsclust
PDF BibTeX XML Cite
Full Text: DOI

References:

[1] Aghabozorgi, S.; Wah, TY, Clustering of large time series data sets, Intell. Data Anal., 18, 793-817, (2014)
[2] Aghabozorgi, S.; Shirkhorshidi, AS; Wah, TY, Time-series clustering—a decade review, Inf. Syst., 53, 16-38, (2015)
[3] Alonso, AM; Berrendero, JR; Hernández, A.; Justel, A., Time series clustering based on forecast densities, Comput. Stat. Data Anal., 51, 762-766, (2006) · Zbl 1157.62484
[4] Anderson, T.W.: An Introduction to Multivariate Statistical Analysis, 2nd edn. Wiley, New York (1984) · Zbl 0651.62041
[5] Ando, T.; Bai, J., Panel data models with grouped factor structure under unknown group membership, J. Appl. Econom., 31, 163-191, (2016)
[6] Ando, T.; Bai, J., Clustering huge number of financial time series: a panel data approach with high-dimensional predictor and factor structures, J. Am. Stat. Assoc., 112, 1182-1198, (2017)
[7] Caiado, J.; Crato, N.; Peña, D., A periodogram-based metric for time series classification, Comput. Stat. Data Anal., 50, 2668-2684, (2006) · Zbl 1445.62222
[8] Caiado, J., Maharaj, E.A., D’Urso, P.: Time Series Clustering. Handbook of Cluster Analysis. Chapman and Hall/CRC, Boca Raton (2015) · Zbl 1396.62196
[9] Corduas, M.; Piccolo, D., Time series clustering and classification by the autoregressive metric, Comput. Stat. Data Anal., 52, 1860-1872, (2008) · Zbl 1452.62624
[10] Davidson, J.: Stochastic Limit Theory. An Introduction for Econometricians. Oxford University Press, London (1994) · Zbl 0904.60002
[11] Douzal-Chouakria, A.; Nagabhushan, PN, Adaptive dis- similarity index for measuring time series proximity, Adv. Data Anal. Classif., 1, 5-21, (2007) · Zbl 1131.62078
[12] D’Urso, P.; Maharaj, EA, Autocorrelation-based fuzzy clustering of time series, Fuzzy Sets Syst., 160, 3565-3589, (2009)
[13] D’Urso, P.; Maharaj, EA; Alonso, AM, Fuzzy clustering of time series using extremes, Fuzzy Sets Syst., 318, 56-79, (2017) · Zbl 1381.62162
[14] Fruhwirth-Schnatter, S.; Kaufmann, S., Model-based clustering of multiple time series, J. Bus. Econ. Stat., 26, 78-89, (2008)
[15] García-Martos, C.; Conejo, AJ; Webster, J. (ed.), Price forecasting techniques in power system, (2013), New York
[16] Golay, X.; Kollias, S.; Stoll, G.; Meier, D.; Valavanis, A.; Boesiger, P., A new correlation-based fuzzy logic clustering algorithm for FMRI, Magn. Reson. Med., 40, 249-260, (2005)
[17] Granger, CW; Morris, MJ, Time series modelling and interpretation, J. R. Stat. Soc. A, 139, 246-257, (1976)
[18] Hallin, M.; Lippi, M., Factor models in high-dimensional time series—a time-domain approach, Stoch. Process. Appl., 123, 2678-2695, (2013) · Zbl 1285.62106
[19] Hamilton, J.D.: Time Series Analysis. Princeton University Press, New Jersey (1994) · Zbl 0831.62061
[20] Hannan, E.J.: Multiple Time Series. Wiley, New York (1970) · Zbl 0211.49804
[21] Hennig, C.; Hennig, C. (ed.); Meila, M. (ed.); Murtagh, F. (ed.); Rocci, R. (ed.), Clustering strategy and method selection, 703-730, (2015), Boca Raton
[22] Hubert, L.; Arabie, P., Comparing partitions, J. Classif., 2, 193-218, (1985) · Zbl 0587.62128
[23] Kakizawa, Y.; Shumway, RH; Taniguchi, M., Discrimination and clustering for multivariate time series, J. Am. Stat. Assoc., 93, 328-340, (1998) · Zbl 0906.62060
[24] Knapp, C.; Carter, G., The generalized correlation method for estimation of time delay, IEEE Trans. Acoust. Speech, 24, 320-327, (1976)
[25] Koopman, SJ; Ooms, M.; Carnero, MA, Periodic seasonal Reg-ARFIMA-GARCH models for daily electricity spot prices, J. Am. Stat. Assoc., 102, 16-27, (2007) · Zbl 1284.62786
[26] Kullback, S.: Information Theory and Statistics. Dover, New York (1968) · Zbl 0088.10406
[27] Lafuente-Rego, B.; Vilar, JA, Clustering of time series using quantile autocovariances, Adv. Data Anal. Classif., 10, 391-415, (2015) · Zbl 1414.62372
[28] Lam, C.; Yao, Q., Factor modeling for high-dimensional time series: inference for the number of factors, Ann. Stat., 40, 694-726, (2012) · Zbl 1273.62214
[29] Larsen, B., Aone, C.: Fast and effective text mining using linear time document clustering. In: Proceedings of the Conference on Knowledge Discovery and Data Mining, pp. 16-22 (1999)
[30] Liao, TW, Clustering of time series data—a survey, Pattern Recogn., 38, 1857-1874, (2005) · Zbl 1077.68803
[31] Lütkepohl, H.: Handbook of Matrices. Wiley, New York (1996) · Zbl 0856.15001
[32] Maharaj, EA, Comparison of non-stationary time series in the frequency domain, Comput. Stat. Data Anal, 40, 131-141, (2002) · Zbl 0990.62078
[33] Maharaj, EA; D’Urso, P., Fuzzy clustering of time series in the frequency domain, Inf. Sci., 181, 1187-1211, (2011) · Zbl 1215.62061
[34] Mahdi, E.; McLeod, IA, Improved multivariate portmanteau test, J. Time Ser. Anal., 33, 211-222, (2012) · Zbl 1300.62062
[35] Meilă, M., Comparing clusterings—an information based distance, J. Multivar. Anal., 98, 873-895, (2007) · Zbl 1298.91124
[36] Montero, P.; Vilar, J., TSclust: an R package for time series clustering, J. Stat. Softw., 62, 1-43, (2014)
[37] Pamminger, C.; Fruhwirth-Schnatter, S., Model-based clustering of categorical time series, Bayesian Anal., 2, 345-368, (2010) · Zbl 1330.62256
[38] Peña, D.; Box, GEP, Identifying a simplifying structure in time series, J. Am. Stat. Assoc., 82, 836-843, (1987) · Zbl 0623.62081
[39] Peña, D.; Rodríguez, J., A powerful portmanteau test of lack of test for time series, J. Am. Stat. Assoc., 97, 601-610, (2002) · Zbl 1073.62554
[40] Peña, D.; Rodríguez, J., Descriptive measures of multivariate scatter and linear dependence, J. Multivar. Anal., 85, 361-374, (2003) · Zbl 1023.62057
[41] Pértega, S.; Vilar, JA, Comparing several parametric and nonparametric approaches to time series clustering: a simulation study, J. Classif., 27, 333-362, (2010) · Zbl 1337.62137
[42] Piccolo, D., A distance measure for classifying ARMA models, J. Time Ser. Anal., 2, 153-163, (1990) · Zbl 0691.62083
[43] Robbins, MW; Fisher, TJ, Cross-correlation matrices for tests of independence and causality between two multivariate time series, J. Bus. Econ. Stat., 33, 459-473, (2015)
[44] Rousseeuw, PJ, Silhouettes: a graphical aid to the interpretation and validation of cluster analysis, Comput. Appl. Math., 20, 53-65, (1987) · Zbl 0636.62059
[45] Sadahiro, Y.; Kobayashi, T., Exploratory analysis of time series data: detection of partial similarities, clustering, and visualization, Comput. Environ. Urban, 45, 24-33, (2014)
[46] Scotto, MG; Barbosa, SM; Alonso, AM, Extreme value and cluster analysis of European daily temperature series, J. Appl. Stat., 38, 2793-2804, (2011)
[47] Tibshirani, R., Regression shrinkage and selection via the lasso, J. R. Stat. Soc. B, 58, 267-288, (1996) · Zbl 0850.62538
[48] Tibshirani, R.; Walther, G.; Hastie, T., Estimating the number of clusters in a data set via the gap statistic, J. R. Stat. Soc. B, 63, 411-423, (2001) · Zbl 0979.62046
[49] Vilar-Fernández, JA; Alonso, AM; Vilar-Fernández, JM, Nonlinear time series clustering based on nonparametric forecast densities, Comput. Stat. Data Anal., 54, 2850-2865, (2010) · Zbl 1284.62575
[50] Vilar, JA; Lafuente-Rego, B.; D’Urso, P., Quantile autocovariances: a powerful tool for hard and soft partitional clustering of time series, Fuzzy Sets Syst., 340, 38-72, (2018) · Zbl 1397.62233
[51] Wang, Y.; Tsay, RS; Ledolter, J.; Shrestha, KM, Forecasting simultaneously high-dimensional time series: a robust model-based clustering approach, J. Forecast., 32, 673-684, (2013) · Zbl 1397.62235
[52] Xiong, Y.; Yeung, D., Time series clustering with ARMA mixtures, Pattern Recogn., 37, 1675-1689, (2004) · Zbl 1117.62488
[53] Zhang, X.; Liu, J.; Du, Y.; Lv, T., A novel clustering method on time series data, Expert Syst. Appl., 38, 11891-11900, (2011)
[54] Zhang, T., Clustering high-dimensional time series based on parallelism, J. Am. Stat. Assoc., 108, 577-588, (2013) · Zbl 06195962
This reference list is based on information provided by the publisher or from digital mathematics libraries. Its items are heuristically matched to zbMATH identifiers and may contain data conversion errors. It attempts to reflect the references listed in the original paper as accurately as possible without claiming the completeness or perfect precision of the matching.