×

zbMATH — the first resource for mathematics

Outlier detection and robust covariance estimation using mathematical programming. (English) Zbl 1284.62057
Summary: The outlier detection problem and the robust covariance estimation problem are often interchangeable. Without outliers, the classical method of maximum likelihood estimation (MLE) can be used to estimate parameters of a known distribution from observational data. When outliers are present, they dominate the log likelihood function causing the MLE estimators to be pulled toward them. Many robust statistical methods have been developed to detect outliers and to produce estimators that are robust against deviation from model assumptions. However, the existing methods suffer either from computational complexity when problem size increases or from giving up desirable properties, such as affine equivariance. An alternative approach is to design a special mathematical programming model to find the optimal weights for all the observations, such that at the optimal solution, outliers are given smaller weights and can be detected. This method produces a covariance estimator that has the following properties: First, it is affine equivariant. Second, it is computationally efficient even for large problem sizes. Third, it easy to incorporate prior beliefs into the estimator by using semi-definite programming. The accuracy of this method is tested for different contamination models, including recently proposed ones. The method is not only faster than the Fast-MCD method for high dimensional data but also has reasonable accuracy for the tested cases.

MSC:
62-07 Data analysis (statistics) (MSC2010)
90-08 Computational methods for problems pertaining to operations research and mathematical programming
Software:
LIBRA; robustbase; SDPT3
PDF BibTeX XML Cite
Full Text: DOI
References:
[1] Alqallaf F, Van Aelst S, Yohai VJ, Zamar RH (2009) Propagation of outliers in multivariate data. Ann Stat 37(1): 311–331 · Zbl 1155.62043 · doi:10.1214/07-AOS588
[2] Bertsekas DP (1999) Nonlinear programming, 2nd edn. Athena Scientific, Belmont · Zbl 1015.90077
[3] Chakraborty B, Chaudhuri P (2008) On an optimization problem in robust statistics. J Comput Graph Stat 17(3): 683–702 · doi:10.1198/106186008X340751
[4] Chandola V, Banerjee A, Kumar V (2007) Outlier detection: a review. Technical Report, University of Minnesota
[5] Critchley F, Schyns M, Haesbroeck G, Fauconnier C, Lu G, Atkinson RA, Wang DQ (2010) A relaxed approach to combinatorial problems in robustness and diagnostics. Stat Comput 20(1): 99–115 · doi:10.1007/s11222-009-9119-x
[6] Critchley F, Schyns M, Haesbroeck G, Kinns D, Atkinson RA, Lu G (2004) The case sensitivity function approach to diagnostics and robust computation: a relaxation strategy. In: COMPSTAT: 2004 Proceedings in Computational Statistics, vol 36, pp 113–125 · Zbl 1170.62321
[7] Huber PJ (2004) Robust statistics. Wiley, New York
[8] Khan J, Van Aelst S, Zamar R (2007) Robust linear model selection based on least angle regression. J Am Stat Assoc 102: 1289–1299 · Zbl 1332.62240 · doi:10.1198/016214507000000950
[9] Kuhn HW, Tucker AW (1951) Nonlinear programming. In: Proceedings of second Berkeley symposium. University of California Press, Berkeley, pp 481–492
[10] Ledoit O, Wolf M (2004) A well-conditioned estimator for large-dimensional covariance matrices. J Multivar Anal 88(2):365–411 · Zbl 1032.62050
[11] Maronna RA, Martin RD, Yohai VJ (2004) Robust statistics: theory and methods. Wiley, New York (2006)
[12] Nguyen TD, Welsch R (2009) Outlier detection and least trimmed squares approximation using semi-definite programming. Comput Stat Data Anal (to appear) · Zbl 1284.62430
[13] Rousseeuw PJ (1984) Least median of squares regression. J Am Stat Assoc 79: 871–880 · Zbl 0547.62046 · doi:10.1080/01621459.1984.10477105
[14] Rousseeuw PJ, van Driessen K (1999) A fast algorithm for the minimum covariance determinant estimator. Technometrics 41: 212–223 · doi:10.1080/00401706.1999.10485670
[15] Schyns M, Haesbroeck G, Critchley F (2010) RelaxMCD: smooth optimisation for the minimum covariance determinant estimator. Comput Stat Data Anal 54(4): 843–857 · Zbl 05689636 · doi:10.1016/j.csda.2009.11.005
[16] Toh KC, Todd MJ, Tutuncu RH (2006) Sdpt3 version 4.0 (beta)–a matlab software for semidefinite-quadratic-linear programming
[17] Vandenberghe L, Boyd S (1996) Semidefinite programming. SIAM Rev 38(1): 49–95 · Zbl 0845.65023 · doi:10.1137/1038003
[18] Vandenberghe L, Boyd S (1999) Applications of semidefinite programming. Appl Numer Math Trans IMACS 29(3): 283–299 · Zbl 0956.90031 · doi:10.1016/S0168-9274(98)00098-1
[19] Verboven S, Hubert M (2005) Libra: a MATLAB library for robust analysis. Chemom Intell Lab Syst 75(2): 127–136 · doi:10.1016/j.chemolab.2004.06.003
This reference list is based on information provided by the publisher or from digital mathematics libraries. Its items are heuristically matched to zbMATH identifiers and may contain data conversion errors. It attempts to reflect the references listed in the original paper as accurately as possible without claiming the completeness or perfect precision of the matching.