×

Chebyshev similarity match between uncertain time series. (English) Zbl 1395.62283

Summary: In real application scenarios, the inherent impreciseness of sensor readings, the intentional perturbation of privacy-preserving transformations, and error-prone mining algorithms cause much uncertainty of time series data. The uncertainty brings serious challenges for the similarity measurement of time series. In this paper, we first propose a model of uncertain time series inspired by Chebyshev inequality. It estimates possible sample value range and central tendency range in terms of sample estimation interval and central tendency estimation interval, respectively, at each time slot. In comparison with traditional models adopting repeated measurements and random variable, Chebyshev model reduces overall computational cost and requires no prior knowledge. We convert Chebyshev uncertain time series into certain time series matrix; therefore noise reduction and dimensionality reduction are available for uncertain time series. Secondly, we propose a new similarity matching method based on Chebyshev model. It depends on overlaps between two sample estimation intervals and overlaps between central tendency estimation intervals from different uncertain time series. At the end of this paper, we conduct an extensive experiment and analyze the results by comparing with prior works.

MSC:

62M10 Time series, auto-correlation, regression, etc. in statistics (GARCH)

Software:

itsmr
PDFBibTeX XMLCite
Full Text: DOI

References:

[1] Orang, M.; Shiri, N., An experimental evaluation of similarity measures for uncertain time series, Proceedings of the 18th International Database Engineering and Applications Symposium (IDEAS ’14), ACM · doi:10.1145/2628194.2628207
[2] Ceriotti, M.; Corrà, M.; D’Orazio, L.; Doriguzzi, R.; Facchin, D.; Gunǎ, Ş. T.; Jesi, G. P.; Cigno, R. L.; Mottola, L.; Murphy, A. L.; Pescalli, M.; Picco, G. P.; Pregnolato, D.; Torghele, C., Is there light at the ends of the tunnel? Wireless sensor networks for adaptive lighting in road tunnels, Proceedings of the 10th ACM/IEEE International Conference on Information Processing in Sensor Networks
[3] Krishnamurthy, L.; Adler, R.; Buonadonna, P.; Chhabra, J.; Flanigan, M.; Kushalnagar, N.; Nachman, L.; Yarvis, M., Design and deployment of industrial sensor networks: experiences from a semiconductor plant and the North Sea, Proceedings of the 3rd ACM International Conference on Embedded Networked Sensor Systems (SenSys ’05), ACM · doi:10.1145/1098918.1098926
[4] Mit, M. S.; Slac, J. B.; Microsoft, D. D., Requirements for science data bases and SciDB, Proceedings of the 4th Biennial Conference on Innovative Data Systems Research (CIDR ’09)
[5] Dan, S.; Howe, B.; Connolly, A., Embracing uncertainty in large-scale computational astrophysics, Proceedings of the 3rd VLDB Workshop on Management of Uncertain Data (MUD ’09)
[6] Tran, T. T. L.; Peng, L.; Li, B.; Diao, Y.; Liu, A., PODS: a new model and processing algorithms for uncertain data streams, Proceedings of the International Conference on Management of Data (SIGMOD ’10) · doi:10.1145/1807167.1807187
[7] Lin, J.; Keogh, E.; Wei, L.; Lonardi, S., Experiencing SAX: a novel symbolic representation of time series, Data Mining and Knowledge Discovery, 15, 2, 107-144, (2007) · doi:10.1007/s10618-007-0064-z
[8] Chen, Q.; Chen, L.; Lian, X., Indexable PLA for efficient similarity search, Proceedings of the 33rd International Conference on Very Large Data Bases, VLDB Endowment
[9] Aggarwal, C. C., Managing and Mining Uncertain Data. Managing and Mining Uncertain Data, Advances in Database Systems, (2011), Springer · Zbl 1185.68271
[10] Zhao, Y.; Aggarwal, C.; Yu, P. S., On wavelet decomposition of uncertain time series data sets, Proceedings of the 19th ACM International Conference on Information and Knowledge Management (CIKM ’10)
[11] Sarangi, S. R.; Murthy, K., DUST: a generalized notion of similarity between uncertain time series, Proceedings of the 16th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD ’10) · doi:10.1145/1835804.1835854
[12] Yeh, M.-Y.; Wu, K.-L.; Yu, P. S.; Chen, M.-S., PROUD: a probabilistic approach to processing similarity queries over uncertain data streams, Proceedings of the 12th International Conference on Extending Database Technology: Advances in Database Technology (EDBT ’09), ACM · doi:10.1145/1516360.1516439
[13] Thalassinakis, E. J.; Dialynas, E. N., A Monte-Carlo simulation method for setting the underfrequency load shedding relays and selecting the spinning reserve policy in autonomous power systems, IEEE Transactions on Power Systems, 19, 4, 2044-2052, (2004) · doi:10.1109/tpwrs.2004.835674
[14] Tong, L.-I.; Chen, K. S.; Chen, H. T., Statistical testing for assessing the performance of lifetime index of electronic components with exponential distribution, International Journal of Quality and Reliability Management, 19, 7, 812-824, (2002) · doi:10.1108/02656710210434757
[15] Aßfalg, K.; Kriegel, H.-P.; Kröger, P.; Renz, M., Probabilistic similarity search for uncertain time series, Scientific and Statistical Database Management. Scientific and Statistical Database Management, Lecture Notes in Computer Science, 5566, 435-443, (2009), Berlin, Germany: Springer, Berlin, Germany · doi:10.1007/978-3-642-02279-1_31
[16] Dallachiesa, M.; Nushi, B.; Mirylenka, K.; Palpanas, T., Uncertain time-series similarity: return to the basics, Proceedings of the VLDB Endowment, 5, 11, 1662-1673, (2012) · doi:10.14778/2350229.2350278
[17] Storey, J. D.; Lovric, M., False discovery rates, International Encyclopedia of Statistical Science, 239, (2011), Springer
[18] Weisberg, H. F., Central tendency and variability, Learning and Individual Differences, 21, 5, 549-554, (1992)
[19] Grubbs, F. E., Procedures for detecting outlying observations in samples, Technometrics, 11, 1, 1-21, (2012)
[20] Jones, R. H., Exponential smoothing for multivariate time series, Journal of the Royal Statistical Society Series B: Methodological, 28, 1, 241-251, (1966) · Zbl 0144.19401
[21] Brockwell, P. J.; Davis, R. A., Introduction to Time Series and Forecasting. Introduction to Time Series and Forecasting, Springer Texts in Statistics, (1996), Springer · Zbl 0868.62067
[22] Chatfield, C., The Analysis of Time Series: An Introduction, (2013), CRC Press · Zbl 0732.62088
[23] Keogh, X., The UCR Time Series Classification/Clustering
[24] Bellman, R. E.; Dreyfus, S. E., Applied dynamic programming, Journal of the American Statistical Association, 59, 305, 293, (1964) · doi:10.2307/2282884
[25] Popivanov, I.; Miller, R. J., Similarity search over time-series data using wavelets, Proceedings of the 18th IEEE International Conference on Data Engineering, IEEE Computer Society · doi:10.1109/ICDE.2002.994711
[26] Struzik, Z. R.; Siebes, A., The haar wavelet transform in the time series similarity paradigm, Principles of Data Mining and Knowledge Discovery. Principles of Data Mining and Knowledge Discovery, Lecture Notes in Computer Science, 1704, 12-22, (1999) · doi:10.1007/978-3-540-48247-5_2
[27] Chan, K.-P.; Fu, A. W.-C., Efficient time series matching by wavelets, Proceedings of the 15th International Conference on Data Engineering (ICDE ’99), IEEE · doi:10.1109/ICDE.1999.754915
[28] Daubechies, I., Ten lectures on wavelets, Acoustical Society of America Journal, 93, 3, 1671, (1992) · Zbl 0776.42018
This reference list is based on information provided by the publisher or from digital mathematics libraries. Its items are heuristically matched to zbMATH identifiers and may contain data conversion errors. In some cases that data have been complemented/enhanced by data from zbMATH Open. This attempts to reflect the references listed in the original paper as accurately as possible without claiming completeness or a perfect matching.