×

Classification for time series data. An unsupervised approach based on reduction of dimensionality. (English) Zbl 07223607

Summary: In this work we use a novel methodology for the classification of time series data, through a natural, unsupervised data learning process. This strategy is based on the sequential use of Multiple Factor Analysis and an ascending Hierarchical Classification Analysis. These two exploratory techniques complement each other and allow for a clustering of the series based on their time paths and on the reduction of the original dimensionality of the data. The extensive set of graphic and numerical tools available for both methods leads to an exhaustive and rigorous visual and metric analysis of the different trajectories, including their differences and similarities, which will turn out to be responsible of the classes ultimately obtained. An application from Finance, used previously in the literature, highlights the versatility and suitability of this approach.

MSC:

62H30 Classification and discrimination; cluster analysis (statistical aspects)
PDFBibTeX XMLCite
Full Text: DOI

References:

[1] Abascal, E.; Grande, I., Análisis de Encuestas (2005), Madrid: ESIC, Madrid
[2] Alonso, AM; Berrendero, JR; Hernández, A.; Justel, A., Time series clustering based on forecast densities, Computational Statistics & Data Analysis, 51, 762-776 (2006) · Zbl 1157.62484 · doi:10.1016/j.csda.2006.04.035
[3] Aluja, T.; Morineau, A., Aprender de los Datos: El Análisis de Componentes Principales, Una Aproximación desde el Data Mining (1999), Barcelona: EUB, Barcelona
[4] Corduas, M. (2010), “Mining time series data: A selective survey.” In F. Palumbo, C.N. Lauro, M.J. Greenacre (eds.), Data Analysis and Classification, Studies in Classification, Data Analysis, and Knowledge Organization, Springer-Verlag, 355-362.
[5] Dazy, F.; Le Barzic, J-F, L’analyse des Données Évolutives, Methodes et Applications (1996), Paris: Technip, Paris · Zbl 0867.62052
[6] Escofier, B.; Pages, J., Le Traitement des Variables Qualitatives et Tableaux Mixtes par Analyse Factorielle Multiple, Data Analysis and Informatics, IV, 2, 179-191 (1986)
[7] Escofier, B. and Pagès, J. (1994), “Multiple factor analysis (AFMULT package)”. Computational Statistics & Data Analysis, 18, North-Holland, 121-140. · Zbl 0825.62517
[8] Escofier, B.; Pages, J., Analyses Factorielles Simples et Multiples. Cours et Etudes de Cas (2016), Paris: Dunod, Paris
[9] Fu, TC, A review on time series data mining, Engineering Applications of Artificial Intelligence, 24, 1, 164-181 (2011) · doi:10.1016/j.engappai.2010.09.007
[10] Husson, F., Josse, J., Le, S. and Mazet, J. (2008), “FactoMineR: Multivariate Exploratory Data Analysis and Data Mining”. The Comprehensive R Archive Network [online], Contributed packages. Available at https://cran.r-project.org/web/packages/FactoMineR/index.html.
[11] Keogh, E.; Chakrabarti, K.; Pazzani, M.; Mehrotra, S., Dimensionality reduction for fast similarity search in large time series databases, Journal of Knowledge and Information Systems, 3, 3, 263-286 (2000) · Zbl 0989.68039 · doi:10.1007/PL00011669
[12] Kuiper, FK; Fisher, L., A Monte Carlo comparison of six clustering procedures, Biometrics, 31, 777-783 (1975) · Zbl 0307.62045 · doi:10.2307/2529565
[13] Landaluce, M. I. (1995), “Estudio de la Estructura de Gasto Medio de las Comunidades Autónomas Españolas. Una Aplicación del Análisis Factorial Multiple”. Ph.D. Thesis, Bilbao: Universidad del País Vasco UPV/EHU.
[14] Lebart, L.; Morineau, A.; Piron, M., Statistique Exploratoire Multidimensionnelle. Visualisation et Inférence en Fouilles de Données (2006), Paris: Dunod, Paris · Zbl 0920.62077
[15] Liao, TW, Clustering of time series data: A survey, Pattern Recognition, 38, 11, 1857-1874 (2005) · Zbl 1077.68803 · doi:10.1016/j.patcog.2005.01.025
[16] Lin, J. and Li, Y. (2009), “Finding structural similarity in time series data using bag-of-patterns representation”. In: Proceedings of the 21st International Conference on Scientific and Statistical Database Management, 461-477.
[17] Montoro, P. and Vilar, J.A. (2014), TSclust: An R package for time series clustering, Journal of Statistical Software, Volume 62, Issue 1.
[18] Rakthanmanon, T., Campana, B., Mueen, A., Batista, G., Westover, B., Zhu, Q., Zakaria, J. and Keogh, E. (2012), “Searching and mining trillions of time series subsequences under dynamic time warping”. In Proceedings of the 18^thACM SIGKDD International Conference on Knowledge Discovery and Data Mining, KDD´12. New York: ACM, 262-270.
[19] Vilar, JM; Vilar, JA; Pértega, S., Classifying time series data: A nonparametric approach, Journal of Classification, 26, 3-28 (2009) · Zbl 1276.62042 · doi:10.1007/s00357-009-9030-3
This reference list is based on information provided by the publisher or from digital mathematics libraries. Its items are heuristically matched to zbMATH identifiers and may contain data conversion errors. In some cases that data have been complemented/enhanced by data from zbMATH Open. This attempts to reflect the references listed in the original paper as accurately as possible without claiming completeness or a perfect matching.