×

Robust fuzzy clustering based on quantile autocovariances. (English) Zbl 1467.62118

Summary: Robustness to the presence of outliers in time series clustering is addressed. Assuming that the clustering principle is to group realizations of series generated from similar dependence structures, three robust versions of a fuzzy \(C\)-medoids model based on comparing sample quantile autocovariances are proposed by considering, respectively, the so-called metric, noise, and trimmed approaches. Each method achieves its robustness against outliers in different manner. The metric approach considers a suitable transformation of the distance aimed at smoothing the effect of the outliers, the noise approach brings together the outliers into a separated artificial cluster, and the trimmed approach removes a fraction of the time series. All the proposed approaches take advantage of the high capability of the quantile autocovariances to discriminate between independent realizations from a broad range of stationary processes, including linear, non-linear and conditional heteroskedastic models. An extensive simulation study involving scenarios with different generating models and contaminated with outliers is performed. Robustness against (i) outliers generated from different generating patterns, and (ii) outliers characterized by isolated, temporary or persistent level changes is evaluated. The influence of the input parameters required by the different algorithms is analyzed. Regardless of the considered models, the results show that the proposed robust procedures are able to neutralize the effect of the anomalous series preserving the true clustering structure, and fairly outperform other robust algorithms based on alternative metrics. Two applications to financial data sets permit to illustrate the usefulness of the proposed models.

MSC:

62H86 Multivariate analysis and fuzziness
62H30 Classification and discrimination; cluster analysis (statistical aspects)
62G08 Nonparametric regression and quantile regression
62M10 Time series, auto-correlation, regression, etc. in statistics (GARCH)
62P05 Applications of statistics to actuarial sciences and financial mathematics
PDFBibTeX XMLCite
Full Text: DOI Link

References:

[1] Aghabozorgi S, Shirkhorshidi AS, Wah TY (2015) Time-series clustering a decade review. Inf Syst 53(C):16-38
[2] Aielli, GP; Caporin, M., Fast clustering of GARCH processes via gaussian mixture models, Math Comput Simul, 94, 205-222 (2013) · Zbl 1499.91181
[3] Alonso, AM; Maharaj, EA, Comparison of time series using subsampling, Comput Stat Data Anal, 50, 10, 2589-2599 (2006) · Zbl 1445.62216
[4] Alonso, AM; Berrendero, JR; Hernández, A.; Justel, A., Time series clustering based on forecast densities, Comput Stat Data Anal, 51, 2, 762-776 (2006) · Zbl 1157.62484
[5] Amendola, A.; Francq, C., Concepts and tools for nonlinear time-series modelling, 377-427 (2009), New York: Wiley, New York
[6] An, HZ; Huang, FC, The geometrical ergodicity of nonlinear autoregressive models, Stat Sin, 6, 4, 943-956 (1996) · Zbl 0857.62085
[7] Arabie, P.; Carroll, JD; DeSarbo, WS; Wind, YJ, Overlapping clustering: a new method for product positioning, J Mark Res, 18, 3, 310-317 (1981)
[8] Baruník J, Kley T (2015) Quantile cross-spectral measures of dependence between economic variables. arXiv:1510.06946
[9] Bastos, JA; Caiado, J., Clustering financial time series with variance ratio statistics, Quant Financ, 14, 12, 2121-2133 (2014) · Zbl 1402.62246
[10] Bezdek, JC, Pattern recognition with fuzzy objective function algorithms (1981), Norwell, MA: Kluwer Academic Publishers, Norwell, MA · Zbl 0503.68069
[11] Caiado, J.; Crato, N., Identifying common dynamic features in stock returns, Quant Financ, 10, 7, 797-807 (2010)
[12] Caiado, J.; Crato, N.; Peña, D., A periodogram-based metric for time series classification, Comput Stat Data Anal, 50, 10, 2668-2684 (2006) · Zbl 1445.62222
[13] Caiado, J.; Crato, N.; Peña, D., Comparison of times series with unequal length in the frequency domain, Commun Stat Simul Comput, 38, 3, 527-540 (2009) · Zbl 1161.37348
[14] Caiado, J.; Maharaj, E.; D’Urso, P.; Hennig, C.; Meila, M.; Murtagh, F.; Rocci, R., Time series clustering, Handbook of cluster analysis, 241-264 (2015), Boca Raton: Chapman and Hall/CRC, Boca Raton
[15] Campello, R.; Hruschka, E., A fuzzy extension of the sihouette width criterion for cluster analysis, Fuzzy Sets Syst, 157, 21, 2858-2875 (2006) · Zbl 1103.68674
[16] Chae, SS; Kim, C.; Kim, JM; Warde, WD, Cluster analysis using different correlation coefficients, Stat Pap, 49, 4, 715-727 (2008) · Zbl 1312.62081
[17] Chen, C.; So, M.; Liu, FC, A review of threshold time series models in finance, Stat Interface, 4, 167-181 (2011) · Zbl 1229.91354
[18] Cimino, M.; Frosini, G.; Lazzerini, B.; Marcelloni, F., On the noise distance in robust fuzzy c-means, Proc World Acad Sci Eng Technol, 1, 361-364 (2005)
[19] Coppi, R.; D’Urso, P., Fuzzy K-means clustering models for triangular fuzzy time trajectories, Stat Methods Appt, 11, 1, 21-40 (2002) · Zbl 1145.62347
[20] Coppi, R.; D’Urso, P., Three-way fuzzy clustering models for LR fuzzy time trajectories, Comput Stat Data Anal, 43, 2, 149-177 (2003) · Zbl 1429.62273
[21] Coppi, R.; D’Urso, P., Fuzzy unsupervised classification of multivariate time trajectories with the Shannon entropy regularization, Comput Stat Data Anal, 50, 6, 1452-1477 (2006) · Zbl 1445.62156
[22] Coppi, R.; D’Urso, P.; Giordani, P.; Bouchon-Meunier, B.; Coletti, G.; Yager, S., Fuzzy C-medoids clustering models for time-varying data, Modern information processing: from theory applications, 195-206 (2006), New York: Elsevier, New York
[23] Coppi, R.; D’Urso, P.; Giordani, P., A fuzzy clustering model for multivariate spatial time series, J Classif, 27, 1, 54-88 (2010) · Zbl 1337.62305
[24] Dave, RN, Characterization and detection of noise in clustering, Pattern Recognit Lett, 12, 11, 657-664 (1991)
[25] Davé, RN; Krishnapuram, R., Robust clustering methods: an unified view, IEEE Trans Fuzzy Syst, 5, 270-293 (1997)
[26] Davé RN, Sen S (1997) Noise clustering algorithm revisited. In: IEEE Fuzzy information processing society, 1997 annual meeting of the North American, NAFIPS’97, pp 199-204
[27] Davé, RN; Sen, S., Robust fuzzy clustering of relational data, IEEE Trans Fuzzy Syst, 10, 6, 713-727 (2002)
[28] De Luca, G.; Zuccolotto, P., Dynamic tail dependence clustering of financial time series, Stat Pap, 58, 3, 641-657 (2017) · Zbl 1416.62581
[29] Dette, H.; Hallin, M.; Kley, T.; Volgushev, S., Of copulas, quantiles, ranks and spectra: an \(l_1\)-approach to spectral analysis, Bernoulli, 21, 2, 781-831 (2015) · Zbl 1337.62286
[30] Di Lascio FML, Giannerini S (2016) Clustering dependent observations with copula functions. Stat Pap 10.1007/s00362-016-0822-3 · Zbl 1411.62165
[31] Disegna M, D’Urso P, Durante F (2017) Copula-based fuzzy clustering of spatial time series. Spat Stat 21(Part A):209-225
[32] Dugard, P.; Todman, JB; Staines, H., Approaching multivariate analysis: a practical introduction (2010), London: Routledge, London
[33] Durante, F.; Pappadà, R.; Torelli, N., Clustering of financial time series in risky scenarios, Adv Data Anal Classif, 8, 4, 359-376 (2014) · Zbl 1414.62241
[34] Durante, F.; Pappadà, R.; Torelli, N., Clustering of time series via non-parametric tail dependence estimation, Stat Pap, 56, 3, 701-721 (2015) · Zbl 1317.62053
[35] D’Urso, P., Fuzzy C-means clustering models for multivariate time-varying data: different approaches, Int J Uncertain Fuzz, 12, 3, 287-326 (2004) · Zbl 1046.62061
[36] D’Urso, P., Fuzzy clustering for data time arrays with inlier and outlier time trajectories, IEEE Trans Fuzzy Syst, 13, 5, 583-604 (2005)
[37] D’Urso, P.; Hennig, C.; Meila, M.; Murtagh, F.; Rocci, R., Fuzzy clustering, Handbook of cluster analysis (2015), Boca Raton: Chapman and Hall/CRC, Boca Raton
[38] D’Urso, P.; De Giovanni, L., Temporal self-organizing maps for telecommunications market segmentation, Neurocomputing, 71, 13, 2880-2892 (2008)
[39] D’Urso, P.; De Giovanni, L., Robust clustering of imprecise data, Chemometr Intell Lab Syst, 136, 58-80 (2014)
[40] D’Urso, P.; Maharaj, EA, Autocorrelation-based fuzzy clustering of time series, Fuzzy Sets Syst, 160, 24, 3565-3589 (2009)
[41] D’Urso, P.; Maharaj, EA, Wavelets-based clustering of multivariate time series, Fuzzy Sets Syst, 193, 33-61 (2012) · Zbl 1237.62079
[42] D’Urso, P.; Cappelli, C.; Di Lallo, D.; Massari, R., Clustering of financial time series, Physica A, 392, 9, 2114-2129 (2013)
[43] D’Urso, P.; De Giovanni, L.; Massari, R.; Di Lallo, D., Noise fuzzy clustering of time series by autoregressive metric, Metron, 71, 3, 217-243 (2013) · Zbl 1302.62207
[44] D’Urso, P.; Di Lallo, D.; Maharaj, EA, Autoregressive model-based fuzzy clustering and its application for detecting information redundancy in air pollution monitoring networks, Soft Comput, 17, 1, 83-131 (2013)
[45] D’Urso, P.; De Giovanni, L.; Maharaj, EA; Massari, R., Wavelet-based self-organizing maps for classifying multivariate time series, J Chemom, 28, 1, 28-51 (2014)
[46] D’Urso, P.; De Giovanni, L.; Massari, R., Time series clustering by a robust autoregressive metric with application to air pollution, Chemometr Intell Lab Syst, 141, 107-124 (2015)
[47] D’Urso, P.; De Giovanni, L.; Massari, R., GARCH-based robust clustering of time series, Fuzzy Sets Syst, 305, 1-28 (2016) · Zbl 1368.62167
[48] D’Urso P, Maharaj EA, Alonso AM (2017a) Fuzzy clustering of time series using extremes. Fuzzy Sets Syst 318(Supplement C):56-79. 10.1016/j.fss.2016.10.006 · Zbl 1381.62162
[49] D’Urso, P.; Massari, R.; Cappelli, C.; De Giovanni, L., Autoregressive metric-based trimmed fuzzy clustering with an application to PM 10 time series, Chemometr Intell Lab Syst, 161, 15-26 (2017)
[50] D’Urso, P.; Giovanni, LD; Massari, R., Robust fuzzy clustering of multivariate time trajectories, Int J Approx Reason, 99, 12-38 (2018) · Zbl 1453.62540
[51] Everitt, B.; Landau, S.; Leese, S., Clust Anal (2001), London: Arnold Press, London · Zbl 1205.62076
[52] Fan, J.; Yao, Q., Nonlinear time series: nonparametric and parametric methods (2005), Springer, New York: Springer series in statistics, Springer, New York
[53] Floriello, D.; Vitelli, V., Sparse clustering of functional data, J Multivar Anal, 154, 1-18 (2017) · Zbl 1353.62069
[54] Fu, TC, A review on time series data mining, Eng Appl Artif Intell, 24, 1, 164-181 (2011)
[55] García-Escudero, LA; Gordaliza, A., Robustness properties of k means and trimmed k means, J Am Stat Assoc, 94, 447, 956-969 (1999) · Zbl 1072.62547
[56] García-Escudero, LA; Gordaliza, A., A proposal for robust curve clustering, J Classif, 22, 2, 185-201 (2005) · Zbl 1336.62179
[57] García-Escudero, LA; Gordaliza, A.; Matrán, C.; Mayo-Iscar, A., A review of robust clustering methods, Adv Data Anal Classif, 4, 2, 89-109 (2010) · Zbl 1284.62375
[58] Górecki, T.; Krzyśko, M.; Waszak, Ł.; Wołyński, W., Selected statistical methods of data analysis for multivariate functional data, Stat Pap, 59, 1, 153-182 (2018) · Zbl 1392.62173
[59] Hagemann A (2013) Robust spectral analysis. arXiv:1111.1965v1
[60] Heiser, WJ; Groenen, PJF, Cluster differences scaling with a within-clusters loss component and a fuzzy successive approximation strategy to avoid local minima, Psychometrika, 62, 1, 63-83 (1997) · Zbl 0889.92037
[61] Höppner, F., Fuzzy cluster analysis: methods for classification, data analysis and image recognition (1999), New York: Wiley, New York · Zbl 0944.65009
[62] Hruschka, H., Market definition and segmentation using fuzzy clustering methods, Int J Res Market, 3, 2, 117-134 (1986)
[63] Hwang, H.; Desarbo, WS; Takane, Y., Fuzzy clusterwise generalized structured component analysis, Psychometrika, 72, 2, 181-198 (2007) · Zbl 1286.62107
[64] James, GM; Sugar, CA, Clustering for sparsely sampled functional data, J Am Stat Assoc, 98, 462, 397-408 (2003) · Zbl 1041.62052
[65] Kalpakis K, Gada D, Puttagunta V (2001) Distance measures for effective clustering of ARIMA time-series. In: Proceedings IEEE international conference on data mining, 2001 (ICDM 2001), pp 273-280
[66] Kamdar T, Joshi A (2000) On creating adaptive web servers using weblog mining. Technical report TR-CS- 00-05, Department of Computer Science and Electrical Engineering, University of Maryland, Baltimore County
[67] Kaufman, L.; Rousseeuw, PJ, Finding groups in data: an introduction to cluster analysis (2009), New York: Wiley, New York
[68] Kley, T.; Volgushev, S.; Dette, H.; Hallin, M., Quantile spectral processes: asymptotic analysis and inference, Bernoulli, 22, 3, 1770-1807 (2016) · Zbl 1369.62245
[69] Kou G, Peng Y, Wang G (2014) Evaluation of clustering algorithms for financial risk analysis using mcdm methods. Inf Sci 275(C):1-12
[70] Krishnapuram R, Joshi A, Yi L (1999) A fuzzy relative of the k-medoids algorithm with application to web document and snippet clustering. In: Snippet clustering, in proceedings of IEEE international conference on fuzzy systems - FUZZIEEE99, Korea, pp 1281-1286
[71] Krishnapuram, R.; Joshi, A.; Nasraoui, O.; Yi, L., Low-complexity fuzzy relational clustering algorithms for web mining, IEEE Trans Fuzzy Syst, 9, 595-607 (2001)
[72] Kwon, SH, Cluster validity index for fuzzy clustering, Electron Lett, 34, 22, 2176-2177 (1998)
[73] Lafuente-Rego, B.; Vilar, JA, Clustering of time series using quantile autocovariances, Adv Data Anal Classif, 10, 3, 391-415 (2016) · Zbl 1414.62372
[74] Lafuente-Rego, B.; Vilar, JA; Douzal-Chouakria, A.; Vilar, JA; Marteau, PF, Fuzzy clustering of series using quantile autocovariances, Advanced analysis and learning on temporal data: first ECML PKDD workshop, AALTD 2015, Porto, Portugal, September 11, 2015, 49-64 (2016), Revised Selected Papers: Springer International Publishing, Cham, Revised Selected Papers
[75] Lee J, Rao S (2012) The quantile spectral density and comparison based tests for nonlinear time series. Unpublished manuscript, Department of Statistics, Texas A&M University, College Station, USA, arXiv:1112.2759v2
[76] Li, TH, Quantile periodograms, J Am Stat Assoc, 107, 498, 765-776 (2014) · Zbl 1261.62082
[77] Liao, TW, Clustering of time series dataa survey, Pattern Recognit, 38, 11, 1857-1874 (2005) · Zbl 1077.68803
[78] Linton, O.; Whang, YJ, The quantilogram: With an application to evaluating directional predictability, J Econom, 141, 1, 250-282 (2007) · Zbl 1418.62338
[79] Maharaj, EA, A significance test for classifying ARMA models, J Stat Comput Simul, 54, 4, 305-331 (1996) · Zbl 0899.62116
[80] Maharaj, EA, Comparison and classification of stationary multivariate time series, Pattern Recognit, 32, 7, 1129-1138 (1999)
[81] Maharaj, EA, Cluster of time series. J Classif, 17, 2, 297-314 (2000) · Zbl 1017.62079
[82] Maharaj, EA; D’Urso, P., A coherence-based approach for the pattern recognition of time series, Physica A, 389, 17, 3516-3537 (2010)
[83] Maharaj, EA; D’Urso, P., Fuzzy clustering of time series in the frequency domain, Inf Sci, 181, 7, 1187-1211 (2011) · Zbl 1215.62061
[84] Maharaj, EA; Alonso, AM; D’Urso, P., Clustering seasonal time series using extreme value analysis: an application to spanish temperature time series, Commun Stat, 1, 4, 175-191 (2015)
[85] McBratney, A.; Moore, A., Application of fuzzy sets to climatic classification, Agric For Meteorol, 35, 1-4, 165-185 (1985)
[86] Montero, P.; Vilar, JA, TSclust: An R package for time series clustering, J Stat Softw, 62, 1, 1-43 (2014)
[87] Otranto, E., Clustering heteroskedastic time series by model-based procedures, Comput Stat Data Anal, 52, 10, 4685-4698 (2008) · Zbl 1452.62784
[88] Otranto, E., Identifying financial time series with similar dynamic conditional correlation, Comput Stat Data Anal, 54, 1, 1-15 (2010) · Zbl 1284.91593
[89] Peña, D., Outliers, influential observations, and missing data, Wiley, New York, chap, 6, 136-170 (2011)
[90] Peng, Y.; Wang, G.; Kou, G.; Shi, Y., An empirical study of classification algorithm evaluation for financial risk prediction, Appl Soft Comput, 11, 2, 2906-2915 (2011)
[91] Pértega, S.; Vilar, JA, Comparing several parametric and nonparametric approaches to time series clustering: a simulation study, J Classif, 27, 3, 333-362 (2010) · Zbl 1337.62137
[92] Pham, TD; Tran, LT, On the first-order bilinear time series model, J Appl Probab, 18, 3, 617-627 (1981) · Zbl 0466.62082
[93] Piccolo, D., A distance measure for classifying arima models, J Time Ser Anal, 11, 2, 153-164 (1990) · Zbl 0691.62083
[94] Rani, S.; Sikka, G., Recent techniques of clustering of time series data: a survey, Int J Comput Appl, 52, 15, 1-9 (2012)
[95] Runkler, TA; Bezdek, JC, Alternating cluster estimation: a new tool for clustering and function approximation, IEEE Trans Fuzzy Syst, 7, 4, 377-393 (1999)
[96] Slaets, L.; Claeskens, G.; Hubert, M., Phase and amplitude-based clustering for functional data, Comput Stat Data Anal, 56, 7, 2360-2374 (2012) · Zbl 1252.62066
[97] Tarpey, T.; Kinateder, KK, Clustering functional data. J Classif, 20, 1, 093-114 (2003) · Zbl 1112.62327
[98] Tsay, RS, Time series model specification in the presence of outliers, J Am Stat Assoc, 81, 393, 132-141 (1986)
[99] Tsay, RS, Some methods for analyzing big dependent data, J Bus Econ Stat, 34, 4, 673-688 (2016)
[100] Vilar, JA; Pértega, S., Discriminant and cluster analysis for gaussian stationary processes: local linear fitting approach, J Nonparametr Stat, 16, 3-4, 443-462 (2004) · Zbl 1076.62063
[101] Vilar, JM; Vilar, JA; Pértega, S., Classifying time series data: a nonparametric approach, J Classif, 26, 1, 3-28 (2009) · Zbl 1276.62042
[102] Vilar, JA; Alonso, AM; Vilar, JM, Non-linear time series clustering based on non-parametric forecast densities, Comput Stat Data Anal, 54, 11, 2850-2865 (2010) · Zbl 1284.62575
[103] Vilar, JA; Lafuente-Rego, B.; D’Urso, P., Quantile autocovariances: a powerful tool for hard and soft partitional clustering of time series, Fuzzy Sets Syst, 340, 38-72 (2018) · Zbl 1397.62233
[104] Wedel, M.; Kamakura, WA, Market segmentation: conceptual and methodological foundations (1998), Boston: Kluwer Academic Press, Boston
[105] Wu, KL; Yang, MS, Alternative c-means clustering algorithms, Pattern Recognit, 35, 10, 2267-2278 (2002) · Zbl 1006.68876
[106] Xie, XL; Beni, G., A validity measure for fuzzy clustering, IEEE Trans Pattern Anal Mach Intell, 13, 8, 841-847 (1991)
[107] Xiong, Y.; Yeung, DY, Time series clustering with ARMA mixtures, Pattern Recognit, 37, 8, 1675-1689 (2004) · Zbl 1117.62488
[108] Yang, MS; Wu, KL, A similarity-based robust clustering method, IEEE Trans Pattern Anal Mach Intell, 26, 4, 434-448 (2004)
This reference list is based on information provided by the publisher or from digital mathematics libraries. Its items are heuristically matched to zbMATH identifiers and may contain data conversion errors. In some cases that data have been complemented/enhanced by data from zbMATH Open. This attempts to reflect the references listed in the original paper as accurately as possible without claiming completeness or a perfect matching.