×

Anomalous cluster detection in spatiotemporal meteorological fields. (English) Zbl 07260621

Summary: Finding anomalous regions in spatiotemporal climate data is an important problem with a need for greater accuracy. The collective and contextual nature of anomalies (e.g., heat waves) coupled with the real-valued, seasonal, multimodal, highly correlated, and gridded nature of climate variable observations poses a multitude of challenges. Existing anomaly detection methods have limitations in the specific setting of real-valued areal spatiotemporal data. In this paper, we develop a method for extreme event detection in meteorological datasets that follows from well known distribution-based anomaly detection approaches. The method models spatial and temporal correlations explicitly through a piecewise parametric assumption and generalizes the Mahalanobis distance across distributions of different dimensionalities. The result is an effective method to mine contiguous spatiotemporal anomalous regions from meteorological fields which improves upon the current standard approach in climatology. The proposed method has been evaluated on a real global surface temperature dataset and validated using historical records of extreme events.

MSC:

62-XX Statistics
68-XX Computer science

Software:

Orca; GMRFLib
PDFBibTeX XMLCite
Full Text: DOI

References:

[1] N. Abe, B. Zadrozny, and J. Langford,Outlier detection by active learning, in Proceedings of the 12th ACM SIGKDD International Conference on
[2] E. Acuôsa and C. Rodriguez,On detection of outliers and their effect in supervised classification,Univ. Puerto Rico at Mayaguez, Mayaguez, Puerto Rico, 2004, available at https://www.researchgate.net/profile/Edgar_Acuna/ publication/228965221_On_Detection_Of_Outliers_And_Their_Effect_In_Supervised_Classification/links/00b7d525e85fac237d000000.pdf.
[3] C. C. Aggarwal,Outlier analysis, inData Mining, Springer, Berlin, Heidelberg, 2015, 237-263.
[4] A. Atkinson,Fast very robust methods for the detection of multiple outliers, J. Am. Stat. Assoc. 89(428) (1994), 1329-1339. · Zbl 0825.62429
[5] S. D. Bay and M. Schwabacher,Mining distance-based outliers in near linear time with randomization and a simple pruning rule, inProceedings of the Ninth ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, Ser. KDD ‘03, ACM, New York, NY, 2003, 29-38. https://doi. org/10.1145/956750.956758.
[6] P. Berkhin,A survey of clustering data mining techniques, inGrouping Multidimensional Data, J. Kogan, C. Nicholas, and M. Teboulle, Eds., Springer, Berlin, Heidelberg, 2006, 25-71. https://doi.org/10.1007/3– 540-28349-8_2. · Zbl 1087.68092
[7] M. M. Breunig et al.,LOF: Identifying density-based local outliers, in Proceedings of the 2000 ACM SIGMOD International Conference on Management of Data, ser. SIGMOD ‘00, ACM, New York, NY, 2000, 93-104. https://doi.org/10.1145/342009.335388.
[8] V. Chandola, A. Banerjee, and V. Kumar,Anomaly detection: A survey, ACM Comput. Surv. 41(3) (2009), 15:1-15:58. https://doi.org/10.1145/1541880. 1541882.
[9] S. Chawla and A. Gionis,k-means: A unified approach to clustering and outlier detection, inSDM, SIAM, Philadelphia, PA, 2013, 189-197. https:// doi.org/10.1137/1.9781611972832.21.
[10] M. Das and S. Parthasarathy,Anomaly detection and spatio-temporal analysis of global climate system, inProceedings of the Third International Workshop on Knowledge Discovery from Sensor Data, ACM, New York, NY, 2009, 142-150, available at http://dl.acm.org/citation.cfm?id= 1601989.
[11] M. Ester et al.,A density-based algorithm for discovering clusters in large spatial databases with noise, Kdd 96(34) (1996), 226-231.
[12] G. Goffredi, Multivariate spatio-temporal anomaly detection in a mobile network, 2015, available at https://www.politesi.polimi.it/handle/10589/107141
[13] I. Goodfellow et al.,Deep learning, Vol 1, MIT Press, Cambridge, 2016. · Zbl 1373.68009
[14] Z. He, X. Xu, and S. Deng,Discovering cluster-based local outliers, Pattern Recogn. Lett. 24(9-10) (2003), 1641-1650. · Zbl 1048.68084
[15] M. F. Jiang, S. S. Tseng, and C. M. Su,Two-phase clustering process for outliers detection, Pattern Recogn. Lett. 22(6-7) (2001), 691-700. · Zbl 1010.68908
[16] Z. Ju and H. Liu,Fuzzy Gaussian mixture models, Pattern Recogn. 45(3) (2012), 1146-1158. · Zbl 1227.62046
[17] E. Kalnay et al.,The NCEP/NCAR 40-year reanalysis project, Bull. Am. Meteorol. Soc. 77(3) (1996), 437-471. https://doi.org/10.1175/15200477(1996)077
[18] S.-C. Kao, A. R. Ganguly, and K. Steinhaeuser,Motivating complex dependence structures in data mining: A case study with anomaly detection in climate, inIEEE International Conference on Data Mining Workshops, 2009. ICDMW’09, IEEE, New York, NY, 2009, 223-230.
[19] E. M. Knox and R. T. Ng,Algorithms for mining distance based outliers in large datasets, inProceedings of the International Conference on Very Large Data Bases, Citeseer, Morgan Kaufmann, San Francisco, CA. 1998, 392-403, available at http://citeseerx.ist.psu.edu/viewdoc/download? doi=10.1.1.103.5746&rep=rep1&type=pdf.
[20] M. Kulldorff,A spatial scan statistic, Commun. Stat.: Theory Methods 26(6) (1997), 1481-1496. https://doi.org/10.1080/03610929708831995. · Zbl 0920.62116
[21] P. Legendre,Spatial autocorrelation: Trouble or new paradigm?Ecology 74(6) (1993), 1659-1673, available at http://www.jstor.org/stable/1939924.
[22] P. C. Loikith and A. J. Broccoli,Characteristics of observed atmospheric circulation patterns associated with temperature extremes over North America, J. Clim. 25(20) (2012), 7266-7281. https://doi.org/10.1175/JCLI-D-1100709.1.
[23] J. Lubchenco and T. R. Karl,Predicting and managing extreme weather events, Physics Today 65(3) (2012), 31-37.
[24] V. Mahadevan et al.,Anomaly detection in crowded scenes, in2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, IEEE, New York, NY, 2010, 1975-1981.
[25] K. V. Mardia,Measures of multivariate skewness and kurtosis with applications, Biometrika 57(3) (1970), 519-530. · Zbl 0214.46302
[26] G. J. McLachlan,Mahalanobis distance, Resonance 4(6) (1999), 20-26.
[27] S. Nadarajah,The exponentiated gumbel distribution with climate application, Environmetrics 17(1) (2006), 13-23.
[28] A. Nanopoulos, Y. Theodoridis, and Y. Manolopoulos,C2p: Clustering based on closest pairs, inProceedings of the International Conference on Very Large Data Bases, Morgan Kaufmann, San Francisco, CA, 2001, 331-340, available at http://www.vldb.org/conf/2001/P331.pdf.
[29] D. B. Neill, Detection of spatial and spatio-temporal clusters, Ph.D. Dissertation, Univ. South Carolina, 2006.
[30] D. B. Neill and A. W. Moore,Anomalous spatial cluster detection, inProceedings of the KDD 2005 Workshop on Data Mining Methods for Anomaly Detection, ACM, New York City, NY, 2005, available at http://www.cs.cmu. edu/afs/cs/Web/People/neill/papers/ADKDD-Neill.pdf.
[31] J. T. Overpeck et al.,Climate data challenges in the 21st century, Science 331(6018) (2011), 700-702. · Zbl 1355.86009
[32] L. X. Pang et al.,A scalable approach for LRT computation in GPGPU environments, inWeb Technologies and Applications, Ser. Lecture Notes in Computer Science, Vol 7808, Y. Ishikawa et al., Eds., Springer, Berlin, Heidelberg, 2013, 595-608. https://doi.org/10.1007/978-3-642-37401-2_58.
[33] B. Ramachandra et al.,Detecting extreme events in gridded climate data, Procedia Comput. Sci. 80 (2016), 2397-2401.
[34] C. E. Rasmussen,Gaussian processes for machine learning, MIT Press, Cambridge, MA, 2006. · Zbl 1177.68165
[35] P. J. Rousseeuw and K. V. Driessen,A fast algorithm for the minimum covariance determinant estimator, Technometrics 41(3) (1999), 212-223. https:// doi.org/10.1080/00401706.1999.10485670.
[36] H. Rue and L. Held,Gaussian Markov random fields: Theory and applications, CRC Press, Boca Raton, FL, 2005. · Zbl 1093.60003
[37] S. S. Shapiro and M. B. Wilk,An analysis of variance test for normality (complete samples), Biometrika 52(3/4) (1965), 591-611. · Zbl 0134.36501
[38] A. Telang et al.,Detecting localized homogeneous anomalies over spatio-temporal data, Data Mining Knowl. Discov. 28(5-6) (2014), 1480-1502.
[39] T. G. Van Niel, T. R. McVicar, and B. Datt,On the relationship between training sample size and data dimensionality: Monte Carlo analysis of broadband multi-temporal classification, Remote Sens. Environ. 98(4) (2005), 468-480.
[40] R. Warren, R. F. Smith, and A. K. Cybenko, Use of Mahalanobis distance for detecting outliers and outlier clusters in markedly non-normal data: A vehicular traffic example, DTIC Document, Tech. Rep., 2011, available at: http://oai.dtic.mil/oai/oai?verb=getRecord&metadataPrefix=html& identifier=ADA545834
[41] L. Xiong, B. Pøsczos, and J. G. Schneider,Group anomaly detection using flexible genre models, inAdvances in Neural Information Processing Systems, Curran Associates, Red Hook. NY, 2011, 1071-1079, available at http://papers.nips.cc/paper/4299-group-anomaly-detection-usingflexible-genre-models.
[42] R. Xu and D. Wunsch,Survey of clustering algorithms, IEEE Trans. Neural Netw. 16(3) (2005), 645-678.
[43] K.
This reference list is based on information provided by the publisher or from digital mathematics libraries. Its items are heuristically matched to zbMATH identifiers and may contain data conversion errors. In some cases that data have been complemented/enhanced by data from zbMATH Open. This attempts to reflect the references listed in the original paper as accurately as possible without claiming completeness or a perfect matching.