×

Outlier detection under interval uncertainty: algorithmic solvability and computational complexity. (English) Zbl 1076.65014

The purpose of the present paper is to investigate outlier detection problems under the interval uncertainty approach, to provide an analysis of their computational complexity, to obtain new efficient algorithms that solve some of these problems (that are tractable under reasonable conditions), and to take into consideration related open problems. The traditional engineering approach to outlier detection takes the following steps: (a) The measurement, “normal” values \(x_1, x_2, \dots, x_n\) are collected. (b) For these normal values the sample average \(E = (x_1 + \cdots+ x_n) / n\) and the standard deviation \(\sigma = \sqrt V\) are computed, where \(V = [(x_1 - E)^2 +\cdots + (x_n - E)^2] / n.\) (c) A new measurement result \(x\) is classified as an outlier if \(x\) is outside the \(k_0\sigma\) interval \([L, U]\), where \(L = E - k_0\sigma, U = E + k_0\sigma\), and \(k_0 > 1\) is some preselected parameter (most frequently, \(k_0 = 2, 3\), or 6). In real life, the normal values \(x_1,\dots, x_n\) are situated within certain interval ranges, and for different values of \(x_i\) \((i = 1,\dots,n)\) within its interval, one get different bounds \(L\) and \(U\) of the \(k_0\sigma\) interval. Detecting now the outliers requires to obtain: (1) the possible outliers, defined as located outside of (at least) one of the possible \(k_0\sigma\) intervals \([L, U]\), and (2) the guaranteed outliers, defined as being located outside of all possible \(k_0\sigma\) intervals \([L, U]\). It is thus essential to compute the exact ranges of the interval bounds \(L\) and \(U\).
The main results may now be summarized as follows: (i) Computing the exact ranges of the outlier interval bounds \(L\) and \(U\) is proved to be, in the general case, an NP-hard problem. (ii) The authors propose efficient (quadratic-time) algorithms that compute the ranges of \(L\) and \(U\) under reasonable conditions. (iii) Once a value is identified as an outlier for a fixed parameter value \(k_0\), the paper shows how to find out to what degree this value is an outlier, i.e. what is the largest value \(k_0\) for which this value is an outlier for sure.

MSC:

65C60 Computational problems in statistics (MSC2010)
65Y20 Complexity and performance of numerical algorithms
65G30 Interval and finite arithmetic
62F25 Parametric tolerance and confidence regions
PDFBibTeX XMLCite
Full Text: DOI Link

References:

[1] Cormen, T. H., Leiserson, C. E., Rivest, R. L., and Stein, C.: Introduction to Algorithms, MIT Press, Cambridge, and Mc-Graw Hill Co., N.Y., 2001. · Zbl 1047.68161
[2] Devore, J. and Peck, R.: Statistics: the Exploration and Analysis of Data, Duxbury, Pacific Grove, 1999. · Zbl 0861.62002
[3] Ferregut, C., Osegueda, R. A., and Nu?ez, A. (eds): Proceedings of the International Workshop on Intelligent NDE Sciences for Aging and Futuristic Aircraft, El Paso, TX, September 30? October 2, 1997.
[4] Ferson, S., Ginzburg, L., Kreinovich, V., and Aviles, M.: Exact Bounds on Sample Variance of Interval Data, in: Extended Abstracts of the 2002 SIAM Workshop on Validated Computing, Toronto, Canada, May 23-25, 2002, pp. 67-69.
[5] Ferson, S., Ginzburg, L., Kreinovich, V., Longpre, L., and Aviles, M.: Computing Variance for Interval Data Is NP-Hard, ACM SIGACT News 33 (2) (2002), pp. 108-118.
[6] Goodchild, M. and Gopal, S.: Accuracy of Spatial Databases, Taylor & Francis, London, 1989.
[7] Gros, X. E.: NDT Data Fusion, J. Wiley, London, 1997.
[8] Kosheleva, O., Cabrera, S., Osegueda, R., Nazarian, S., George, D. L., George, M. J., Kreinovich, V., and Worden, K.: Case Study of Non-Linear Inverse Problems: Mammography and Non-Destructive Evaluation, in: Mohamad-Djafari, A. (ed.), Bayesian Inference for Inverse Problems, Proceedings of the SPIE/International Society for Optical Engineering 3459, San Diego, 1998, pp. 128-135.
[9] Kreinovich, V., Lakeyev, A., Rohn, J., and Kahl, P.: Computational Complexity and Feasibility of Data Processing and Interval Computations, Kluwer Academic Publishers, Dordrecht, 1997. · Zbl 0945.68077
[10] Kreinovich, V., Longpr?, L., Patangay, P., Ferson, S., and Ginzburg, L.: Outlier Detection under Interval Uncertainty: Algorithmic Solvability and Computational Complexity, in: Lirkov, I., Margenov, S., Wasniewski, J., and Yalamov, P. (eds), Large-Scale Scientific Computing, Proceedings of the 4-th International Conference LSSC?2003, Sozopol, Bulgaria, June 4-8, 2003, Springer Lecture Notes in Computer Science 2907, 2004, pp. 238-245. · Zbl 1151.68588
[11] Kreinovich, V., Patangay, P., Longpr?, L., Starks, S. A., Campos, C., Ferson, S., and Ginzburg, L.: Outlier Detection under Interval and Fuzzy Uncertainty: Algorithmic Solvability and Computational Complexity, in: Proceedings of the 22nd International Conference of the North American Fuzzy Information Processing Society NAFIPS?2003, Chicago, Illinois, July 24-26, 2003, pp. 401-406.
[12] McCain, M. and William, C.: Integrating Quality Assurance into the GIS Project Life Cycle, in: Proceedings of the 1998 ESRI Users Conference, http://www.dogcreek.com/html/documents.html.
[13] Osegueda, R., Kreinovich, V., Potluri, L., and Al?, R.: Non-Destructive Testing of Aerospace Structures: Granularity and Data Mining Approach, in: Proceedings of FUZZ-IEEE?2002, Honolulu, Hawaii, May 12-17, 2002, Vol. 1, pp. 685-689.
[14] Osegueda, R. A., Seelam, S. R., Holguin, A. C., Kreinovich, V., and Tao, C.-W.: Statistical and Dempster-Shafer Techniques in Testing Structural Integrity of Aerospace Structures, International Journal of Uncertainty, Fuzziness, Knowledge-Based Systems (IJUFKS) 9 (6) (2001), pp. 749-758. · Zbl 1113.62354
[15] Rabinovich, S.: Measurement Errors: Theory and Practice, American Institute of Physics, New York, 1993.
[16] Scott, L.: Identification of GIS Attribute Error Using Exploratory Data Analysis, Professional Geographer 46 (3) (1994), pp. 378-386.
[17] Vavasis, S. A.: Nonlinear Optimization: Complexity Issues, Oxford University Press, N.Y., 1991. · Zbl 0785.90091
[18] Wadsworth, H. M., Jr. (ed.): Handbook of Statistical Methods for Engineers and Scientists, McGraw-Hill Publishing, N.Y., 1990.
[19] Wen, Q., Gates, A. Q., Beck, J., Kreinovich, V., and Keller, G. R.: Towards Automatic Detection of Erroneous Measurement Results in a Gravity Database, in: Proceedings of the 2001 IEEE Systems, Man, and Cybernetics Conference, Tucson, Arizona, October 7-10, 2001, pp. 2170? 2175.
[20] Worden, K., Osegueda, R., Ferregut, C., Nazarian, S., George, D. L., George, M. J., Kreinovich, V., Kosheleva, O., and Cabrera, S.: Interval Methods in Non-Destructive Testing of Aerospace Structures and in Mammography, in: International Conference on Interval Methods and Their Application in Global Optimization (INTERVAL?98), April 20-23, 1998, Nanjing, China, Abstracts, pp. 152-154. · Zbl 1045.65043
[21] Worden, K., Osegueda, R., Ferregut, C., Nazarian, S., Rodriguez, E., George, D. L., George, M. J., Kreinovich, V., Kosheleva, O., and Cabrera, S.: Interval Approach to Non-Destructive Testing of Aerospace Structures and to Mammography, in: Alefeld, G. and Trejo, R. A. (eds.), Interval Computations and Its Applications to Reasoning under Uncertainty, Knowledge Representation, and Control Theory. Proceedings of MEXICON?98, Workshop on Interval Computations, 4th World Congress on Expert Systems, M?xico City, M?xico, 1998. · Zbl 1045.65043
[22] Worden, K., Osegueda, R., Ferregut, C., Nazarian, S. George, D. L., George, M. J., Kreinovich, V., Kosheleva, O., and Cabrera, S.: Interval Methods in Non-Destructive Testing of Material Structures, Reliable Computing 7 (4) (2001), pp. 341-352. · Zbl 1045.65043
This reference list is based on information provided by the publisher or from digital mathematics libraries. Its items are heuristically matched to zbMATH identifiers and may contain data conversion errors. In some cases that data have been complemented/enhanced by data from zbMATH Open. This attempts to reflect the references listed in the original paper as accurately as possible without claiming completeness or a perfect matching.