×

SAZED: parameter-free domain-agnostic season length estimation in time series data. (English) Zbl 1464.62396

Summary: Season length estimation is the task of identifying the number of observations in the dominant repeating pattern of seasonal time series data. As such, it is a common pre-processing task crucial for various downstream applications. Inferring season length from a real-world time series is often challenging due to phenomena such as slightly varying period lengths and noise. These issues may, in turn, lead practitioners to dedicate considerable effort to preprocessing of time series data since existing approaches either require dedicated parameter-tuning or their performance is heavily domain-dependent. Hence, to address these challenges, we propose SAZED: spectral and average autocorrelation zero distance density. SAZED is a versatile ensemble of multiple, specialized time series season length estimation approaches. The combination of various base methods selected with respect to domain-agnostic criteria and a novel seasonality isolation technique, allow a broad applicability to real-world time series of varied properties. Further, SAZED is theoretically grounded and parameter-free, with a computational complexity of \(\mathcal{O}(n\log n)\), which makes it applicable in practice. In our experiments, SAZED was statistically significantly better than every other method on at least one dataset. The datasets we used for the evaluation consist of time series data from various real-world domains, sterile synthetic test cases and synthetic data that were designed to be seasonal and yet have no finite statistical moments of any order.

MSC:

62M10 Time series, auto-correlation, regression, etc. in statistics (GARCH)
62M15 Inference from stochastic processes and spectral analysis
PDFBibTeX XMLCite
Full Text: DOI

References:

[1] Andrews, DF; Herzberg, AM, Data: a collection of problems from many fields for the student and research worker (2012), Berlin: Springer, Berlin · Zbl 0567.62002
[2] Box, GE; Jenkins, GM; Reinsel, GC; Ljung, GM, Time series analysis: forecasting and control (2015), London: Wiley, London · Zbl 1317.62001
[3] Cerqueira V, Torgo L, Pinto F, Soares C (2017) Arbitrated ensemble for time series forecasting. In: Joint European conference on machine learning and knowledge discovery in databases, Springer, Berlin, pp 478-494
[4] Chan KS, Ripley B (2012) TSA: time series analysis. https://CRAN.R-project.org/package=TSA, R package version 1.01
[5] Cleveland, RB; Cleveland, WS; McRae, JE; Terpenning, I., STL: a seasonal-trend decomposition, J Off Stat, 6, 1, 3-73 (1990)
[6] Demšar, J., Statistical comparisons of classifiers over multiple data sets, J Mach Learn Res, 7, Jan, 1-30 (2006) · Zbl 1222.68184
[7] Elfeky, MG; Aref, WG; Elmagarmid, AK, Periodicity detection in time series databases, IEEE Trans Knowl Data Eng, 17, 7, 875-887 (2005) · doi:10.1109/TKDE.2005.114
[8] Elfeky MG, Aref WG, Elmagarmid AK (2005b) WARP: time warping for periodicity detection. In: Data mining, fifth IEEE international conference on, IEEE, pp 8-pp
[9] Fuller, WA, Introduction to statistical time series (2009), London: Wiley, London
[10] Ghosh A, Lucas C, Sarkar R (2017) Finding periodic discrete events in noisy streams. In: Proceedings of the 2017 ACM on conference on information and knowledge management, ACM, pp 627-636
[11] Hamilton, JD, Time series analysis (1994), Princeton: Princeton University Press, Princeton · Zbl 0831.62061
[12] Hyndman RJ (2012) Measuring time series characteristics. https://robjhyndman.com/hyndsight/tscharacteristics/, Accessed 21 Feb 2018
[13] Hyndman RJ (2013) FPP: data for “Forecasting: principles and practice”. https://CRAN.R-project.org/package=fpp, R package version 0.5
[14] Hyndman RJ (2015) expsmooth: Data Sets from “Forecasting with exponential smoothing”. https://CRAN.R-project.org/package=expsmooth, R package version 2.3
[15] Hyndman RJ (2017a) FMA: data sets from “Forecasting: methods and applications” by Makridakis, Wheelwright & Hyndman (1998). https://CRAN.R-project.org/package=fma, R package version 2.3
[16] Hyndman RJ (2017b) FPP2: data for “Forecasting: principles and practice” (2nd edition). https://CRAN.R-project.org/package=fpp2, R package version 2.1
[17] Hyndman RJ, Athanasopoulos G (2018) Forecasting: principles and practice. OTexts
[18] Jönsson, P.; Eklundh, L., Seasonality extraction by function fitting to time-series of satellite sensor data, IEEE Trans Geosci Remote Sens, 40, 8, 1824-1832 (2002) · doi:10.1109/TGRS.2002.802519
[19] Keogh E, Lonardi S, Ratanamahatana CA (2004) Towards parameter-free data mining. In: Proceedings of the tenth ACM SIGKDD international conference on knowledge discovery and data mining, ACM, pp 206-215
[20] Kleiber C, Zeileis A (2008) Applied econometrics with R. Springer, New York, https://CRAN.R-project.org/package=AER, ISBN 978-0-387-77316-2 · Zbl 1155.91004
[21] Kumar M, Patel NR, Woo J (2002) Clustering seasonality patterns in the presence of errors. In: Proceedings of the eighth ACM SIGKDD international conference on knowledge discovery and data mining. ACM, pp 557-563
[22] Lijffijt J, Papapetrou P, Puolamäki K (2012) Size matters: finding the most informative set of window lengths. In: Joint European conference on machine learning and knowledge discovery in databases. Springer, Berlin, pp 451-466 · Zbl 1405.62080
[23] Pierson E, Althoff T, Leskovec J (2018) Modeling individual cyclic variation in human behavior. In: Proceedings of the 2018 world wide web conference on world wide web, international world wide web conferences steering committee, pp 107-116
[24] Ramírez D, Schreier PJ, Vía J, Santamaría I, Scharf LL (2014) A regularized maximum likelihood estimator for the period of a cyclostationary process. 48th Asilomar conference on signals. Systems and Computers, IEEE, pp 1972-1976
[25] Rasheed, F.; Alshalalfa, M.; Alhajj, R., Efficient periodicity mining in time series databases using suffix trees, IEEE Trans Knowl Data Eng, 23, 1, 79-94 (2011) · doi:10.1109/TKDE.2010.76
[26] Rife, D.; Boorstyn, R., Single tone parameter estimation from discrete-time observations, IEEE Trans Inf Theory, 20, 5, 591-598 (1974) · Zbl 0302.62045 · doi:10.1109/TIT.1974.1055282
[27] Sheather, SJ; Jones, MC, A reliable data-based bandwidth selection method for kernel density estimation, J R Stat Soc: Ser B (Methodological), 53, 3, 683-690 (1991) · Zbl 0800.62219
[28] Stoffer D (2016) astsa: Applied statistical time series analysis. https://CRAN.R-project.org/package=astsa, R package version 1.7
[29] Toller M, Kern R (2017) Robust parameter-free season length detection in time series. In: Proceedings of the 3rd SIGKDD workshop on mining and learning from time series
[30] Toller M, Santos T, Kern R (2019) sazedR: parameter-free domain-agnostic season length detection in time series. https://CRAN.R-project.org/package=sazedR, R package version 2.0.0
[31] Vlachos M, Yu P, Castelli V (2005) On periodicity detection and structural periodic similarity. In: Proceedings of the 2005 SIAM international conference on data mining, SIAM, pp 449-460
[32] Wang, J.; Chen, T.; Huang, B., Cyclo-period estimation for discrete-time cyclo-stationary signals, IEEE Trans Signal Process, 54, 1, 83-94 (2006) · Zbl 1373.94726 · doi:10.1109/TSP.2005.859237
[33] Wang, X.; Smith, K.; Hyndman, RJ, Characteristic-based clustering for time series data, Data Min Knowl Discov, 13, 3, 335-364 (2006) · doi:10.1007/s10618-005-0039-x
[34] Yuan Q, Shang J, Cao X, Zhang C, Geng X, Han J (2017) Detecting multiple periods and periodic patterns in event time sequences. In: Proceedings of the 2017 ACM on conference on information and knowledge management, ACM, pp 617-626
This reference list is based on information provided by the publisher or from digital mathematics libraries. Its items are heuristically matched to zbMATH identifiers and may contain data conversion errors. In some cases that data have been complemented/enhanced by data from zbMATH Open. This attempts to reflect the references listed in the original paper as accurately as possible without claiming completeness or a perfect matching.