×

Clustering of time series using quantile autocovariances. (English) Zbl 1414.62372

Summary: Time series clustering is an active research topic with applications in many fields. Unlike conventional clustering on multivariate data, time series often change over time so that the similarity concept between objects must take into account the dynamic of the series. In this paper, a distance measure aimed to compare quantile autocovariance functions is proposed to perform clustering of time series. Quantile autocovariances provide information about the serial dependence structure at different pairs of quantile levels, require no moment condition and allow to identify dependence features that covariance-based methods are unable to detect. Results from an extensive simulation study show that the proposed metric outperforms or is highly competitive with a range of dissimilarities reported in the literature, particularly exhibiting high capability to cluster time series generated from a broad range of dependence models. Estimation of the optimal number of clusters is also addressed. For illustrative purposes, our methodology is applied to a real dataset involving financial time series.

MSC:

62M10 Time series, auto-correlation, regression, etc. in statistics (GARCH)
62H30 Classification and discrimination; cluster analysis (statistical aspects)
PDF BibTeX XML Cite
Full Text: DOI

References:

[1] Advances in Data Analysis and Classification (2011) Special issue on “Time series clustering”, vol 5(4). Springer, New York
[2] Caiado, J.; Crato, N.; Peña, D., A periodogram-based metric for time series classification, Comput Stat Data Anal, 50, 2668-2684, (2006) · Zbl 1445.62222
[3] Caliński, T.; Harabasz, J., A dendrite method for cluster analysis, Commun Stat Simul Comput, 3, 1-27, (1974) · Zbl 0273.62010
[4] Chen C, Wei Y (2005) Computational issues for quantile regression. Sankhyā Indian J Stat 67:399-417 · Zbl 1192.62114
[5] Corduas, M.; Piccolo, D., Time series clustering and classification by the autoregressive metric, Comput Stat Data Anal, 52, 1860-1872, (2008) · Zbl 1452.62624
[6] Davis, RA; Mikosch, T., The sample autocorrelations of heavy-tailed processes with applications to arch, Ann Stat, 26, 2049-2080, (1998) · Zbl 0929.62092
[7] Davis, RA; Mikosch, T., The extremogram: a correlogram for extreme events, Bernoulli, 15, 977-1009, (2009) · Zbl 1200.62104
[8] Luca, G.; Zuccolotto, P., A tail dependence-based dissimilarity measure for financial time series clustering, Adv Data Anal Classif, 5, 323-340, (2011)
[9] Dette H, Hallin M, Kley T, Volgushev S (2014) Of copulas, quantiles, ranks and spectra: An \(l_1\)-approach to spectral analysis. Unpublished manuscript, arXiv:1111.7205v2 · Zbl 1337.62286
[10] Dudoit, S.; Fridlyand, J., A prediction-based resampling method for estimating the number of clusters in a dataset., Genome Biol, 3, research0036.1-research0036.21, (2002)
[11] D’Urso, P.; Maharaj, EA, Autocorrelation-based fuzzy clustering of time series, Fuzzy Sets Syst, 160, 3565-3589, (2009)
[12] D’Urso, P.; Cappelli, C.; Lallo, DD; Massari, R., Clustering of financial time series, Physica A, 392, 2114-2129, (2013)
[13] Frühwirth-Schnatter S (2011) Adv Data Anal Classif 5(4):251-280
[14] Frühwirth-Schnatter, S.; Kaufmann, S., Model-based clustering of multiple time series, J Business Econ Stat, 26, 78-89, (2008)
[15] Tc, Fu, A review on time series data mining, Eng Appl Artif Intell, 24, 164-181, (2011)
[16] Gavrilov M, Anguelov D, Indyk P, Motwani R (2000) Mining the stock market (extended abstract): which measure is best? In: Proceedings of the sixth ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, ACM, New York, KDD’00, pp 487-496
[17] Hagemann A (2013) Robust spectral analysis. Unpublished manuscript, arXiv:1111.1965v1
[18] Hartigan JA (1975) Clustering algorithms, 99th edn. Wiley, New York · Zbl 0372.62040
[19] Hong, Y., Generalized spectral tests for serial dependence, J R Stat Soc Ser B Stat Methodol, 62, 557-574, (2000) · Zbl 0963.62043
[20] Hubert, L.; Arabie, P., Comparing partitions, J Classif, 2, 193-218, (1985) · Zbl 0587.62128
[21] Hyndman, RJ; Fan, Y., Sample quantiles in statistical packages, Am Stat, 50, 361-365, (1996)
[22] Kao SC, Ganguly AR, Steinhaeuser K (2009) Motivating complex dependence structures in data mining: A case study with anomaly detection in climate. In: Saygin Y, Yu JX, Kargupta H, Ranka S, Yu PS, Wu X (eds) 2013 IEEE 13th International Conference on Data Mining Workshops, IEEE Computer Society, Los Alamitos, pp 223-230
[23] Kaufman L, Rousseeuw PJ (1990) Finding groups in data: an introduction to cluster analysis. Wiley, New York · Zbl 1345.62009
[24] Keogh, E.; Kasetty, S., On the need for time series data mining benchmarks: A survey and empirical demonstration, Data Min Knowl Discov, 7, 349-371, (2003)
[25] Koenker R (2005) Quantile regression. Econometric Society Monographs, Cambridge · Zbl 1111.62037
[26] Koenker, RW; D’Orey, V., Algorithm as 229: computing regression quantiles, J Royal Stat Soc Series C Appl Stat, 36, 383-393, (1987)
[27] Krzanowski, WJ; Lai, YT, A criterion for determining the number of groups in a data set using sum-of-squares clustering, Biometrics, 44, 23-34, (1988) · Zbl 0707.62122
[28] Lee J, Rao S (2012) The quantile spectral density and comparison based tests for nonlinear time series. Unpublished manuscript, Department of Statistics, Texas A&M University, College Station, arXiv:1112.2759v2
[29] Li, TH, Quantile periodograms, J Am Stat Assoc, 107, 765-776, (2014) · Zbl 1261.62082
[30] Liao, TW, Clustering of time series data: a survey, Pattern Recognit, 38, 1857-1874, (2005) · Zbl 1077.68803
[31] Linton, O.; Whang, YJ, The quantilogram: with an application to evaluating directional predictability, J Econom, 141, 250-282, (2007) · Zbl 1418.62338
[32] Maharaj, EA, A significance test for classifying ARMA models, J Stat Comput Simul, 54, 305-331, (1996) · Zbl 0899.62116
[33] Maharaj, EA, Clusters of time series, J Classifi, 17, 297-314, (2000) · Zbl 1017.62079
[34] Mikosch, T.; Stărică, C., Limit theory for the sample autocorrelations and extremes of a garch (1,1) process, Ann Stat, 28, 1427-1451, (2000) · Zbl 1105.62374
[35] Montero P, Vilar JA (2014a) TSclust: An \(\sf R\) package for time series clustering. J Stat Softw 62(1):1-43
[36] Montero P, Vilar JA (2014b) TSclust: Time series clustering utilities. http://CRAN.R-project.org/package=TSclust, \(\sf R\) package version 1.2.1
[37] Otranto, E., Clustering heteroskedastic time series by model-based procedures, Comput Stat Data Anal, 52, 4685-4698, (2008) · Zbl 1452.62784
[38] Pértega, S.; Vilar, JA, Comparing several parametric and nonparametric approaches to time series clustering: a simulation study, J Classif, 27, 333-362, (2010) · Zbl 1337.62137
[39] Piccolo, D., A distance measure for classifying arima models, J Time Series Anal, 11, 153-164, (1990) · Zbl 0691.62083
[40] Ramoni, M.; Sebastiani, P.; Cohen, P., Bayesian clustering by dynamics, Mach Learn, 47, 91-121, (2002) · Zbl 1012.68154
[41] \(\sf R\) Core Team (2014) \(\sf R\): A language and environment for statistical computing. \(\sf R\) Foundation for Statistical Computing, Vienna, Austria. http://www.R-project.org/
[42] Skaug, HJ; Tjøstheim, D., nonparametric test of serial independence based on the empirical distribution function, Biometrika, 80, 591-602, (1993) · Zbl 0790.62044
[43] Taylor S (2007) Modelling financial time series. Wiley, New York · Zbl 1130.91345
[44] Tibshirani, R.; Walther, G.; Hastie, T., Estimating the number of clusters in a data set via the gap statistic, J R Stat Soc Ser B Stat Methodol, 63, 411-423, (2001) · Zbl 0979.62046
[45] Vilar, JA; Pértega, S., Discriminant and cluster analysis for gaussian stationary processes: local linear fitting approach, J Nonparametr Stat, 16, 443-462, (2004) · Zbl 1076.62063
[46] Vilar, JA; Alonso, AM; Vilar, JM, Non-linear time series clustering based on non-parametric forecast densities, Comput Stat Data Anal, 54, 2850-2865, (2010) · Zbl 1284.62575
[47] Wang, X.; Mueen, A.; Ding, H.; Trajcevski, G.; Scheuermann, P.; Keogh, EJ, Experimental comparison of representation methods and distance measures for time series data, Data Min Knowl Discov, 26, 275-309, (2013)
This reference list is based on information provided by the publisher or from digital mathematics libraries. Its items are heuristically matched to zbMATH identifiers and may contain data conversion errors. It attempts to reflect the references listed in the original paper as accurately as possible without claiming the completeness or perfect precision of the matching.