×

Analysis of the forward search using some new results for martingales and empirical processes. (English) Zbl 1388.62206

Bernoulli 22, No. 2, 1131-1183 (2016); corrigendum ibid. 25, No. 4A, 3201 (2019).
Summary: The forward search is an iterative algorithm for avoiding outliers in a regression analysis suggested by A. S. Hadi and J. S. Simonoff [“Procedures for the identification of multiple outliers in linear models”, J. Am. Stat. Assoc. 88, No. 424, 1264–1272 (1993), http://www.jstor.org/stable/2291266], see also [A. Atkinson and M. Riani, Robust diagnostic regression analysis. New York, NY: Springer (2000; Zbl 0964.62063)]. The algorithm constructs subsets of “good” observations so that the size of the subsets increases as the algorithm progresses. It results in a sequence of regression estimators and forward residuals. Outliers are detected by monitoring the sequence of forward residuals. We show that the sequences of regression estimators and forward residuals converge to Gaussian processes. The proof involves a new iterated martingale inequality, a theory for a new class of weighted and marked empirical processes, the corresponding quantile process theory, and a fixed point argument to describe the iterative aspect of the procedure.

MSC:

62J05 Linear regression; mixed models
62H12 Estimation in multivariate analysis
62F12 Asymptotic properties of parametric estimators
62F35 Robustness and adaptive procedures (parametric inference)

Citations:

Zbl 0964.62063

Software:

R; ForwardSearch
PDFBibTeX XMLCite
Full Text: DOI arXiv Euclid

References:

[1] Aroian, L. (1941). A study of R.A. Fisher’s \(z\) distribution and the related \(F\) distribution. Ann. Math. Statist. 12 429-448. · Zbl 0063.00121 · doi:10.1214/aoms/1177731681
[2] Atkinson, A. and Riani, M. (2000). Robust Diagnostic Regression Analysis . New York: Springer. · Zbl 0964.62063 · doi:10.1007/978-1-4612-1160-0
[3] Atkinson, A.C. (1994). Fast very robust methods for detection of multiple outliers. J. Amer. Statist. Assoc. 89 1329-1339. · Zbl 0825.62429 · doi:10.2307/2290995
[4] Atkinson, A.C. and Riani, M. (2006). Distribution theory and simulations for tests of outliers in regression. J. Comput. Graph. Statist. 15 460-476. · doi:10.1198/106186006X113593
[5] Atkinson, A.C., Riani, M. and Cerioli, A. (2010). The Forward Search: Theory and data analysis (with discussion). J. Korean Statist. Soc. 39 117-134. · Zbl 1294.62149 · doi:10.1016/j.jkss.2010.02.007
[6] Atkinson, A.C., Riani, M. and Cerioli, A. (2010). Rejoinder: The Forward Search: Theory and data analysis. J. Korean Statist. Soc. 39 161-163. · Zbl 1294.62150 · doi:10.1016/j.jkss.2010.02.008
[7] Bahadur, R.R. (1966). A note on quantiles in large samples. Ann. Math. Statist. 37 577-580. · Zbl 0147.18805 · doi:10.1214/aoms/1177699450
[8] Bellini, T. (2015). The forward search interactive outlier detection in cointegrated VAR analysis. Adv. Data Anal. Classif. · Zbl 1284.62194 · doi:10.1007/s11634-010-0072-5
[9] Bercu, B. and Touati, A. (2008). Exponential inequalities for self-normalized martingales with applications. Ann. Appl. Probab. 18 1848-1869. · Zbl 1152.60309 · doi:10.1214/07-AAP506
[10] Bickel, P.J. (1975). One-step Huber estimates in the linear model. J. Amer. Statist. Assoc. 70 428-434. · Zbl 0322.62038 · doi:10.2307/2285834
[11] Billingsley, P. (1999). Convergence of Probability Measures , 2nd ed. New York: Wiley. · Zbl 0944.60003
[12] Cavaliere, G. and Georgiev, I. (2013). Exploiting infinite variance through dummy variables in nonstationary autoregressions. Econometric Theory 29 1162-1195. · Zbl 1290.62070 · doi:10.1017/S0266466613000030
[13] Cerioli, A., Farcomeni, A. and Riani, M. (2014). Strong consistency and robustness of the Forward Search estimator of multivariate location and scatter. J. Multivariate Anal. 126 167-183. · Zbl 1281.62135 · doi:10.1016/j.jmva.2013.12.010
[14] Csörgő, M. (1983). Quantile Processes with Statistical Applications. CBMS-NSF Regional Conference Series in Applied Mathematics 42 . Philadelphia, PA: SIAM.
[15] Dollinger, M.B. and Staudte, R.G. (1991). Influence functions of iteratively reweighted least squares estimators. J. Amer. Statist. Assoc. 86 709-716. · Zbl 0739.62024 · doi:10.2307/2290402
[16] Engler, E. and Nielsen, B. (2009). The empirical process of autoregressive residuals. Econom. J. 12 367-381. · Zbl 1206.62147 · doi:10.1111/j.1368-423X.2009.00282.x
[17] Guenther, W.C. (1977). An easy method for obtaining percentage points of order statistics. Technometrics 19 319-321. · Zbl 0371.62069 · doi:10.2307/1267702
[18] Hadi, A.S. (1992). Identifying multiple outliers in multivariate data. J. R. Stat. Soc. Ser. B. Stat. Methodol. 54 761-771.
[19] Hadi, A.S. and Simonoff, J.S. (1993). Procedures for the identification of multiple outliers in linear models. J. Amer. Statist. Assoc. 88 1264-1272. · doi:10.1080/01621459.1993.10476407
[20] Hawkins, D.M. and Olive, D.J. (2002). Inconsistency of resampling algorithms for high-breakdown regression estimators and a new algorithm. J. Amer. Statist. Assoc. 97 136-159. · Zbl 1073.62546 · doi:10.1198/016214502753479293
[21] Helland, I.S. (1982). Central limit theorems for martingales with discrete or continuous time. Scand. J. Stat. 9 79-94. · Zbl 0486.60023
[22] Johansen, S. and Nielsen, B. (2009). An analysis of the indicator saturation estimator as a robust regression estimator. In The Methodology and Practice of Econometrics (J.L. Castle and N. Shephard, eds.) 1-36. Oxford: Oxford Univ. Press. · Zbl 1384.62232 · doi:10.1093/acprof:oso/9780199237197.003.0001
[23] Johansen, S. and Nielsen, B. (2010). Discussion: The Forward Search: Theory and data analysis. J. Korean Statist. Soc. 39 137-145. · Zbl 1294.62155 · doi:10.1016/j.jkss.2010.02.003
[24] Johansen, S. and Nielsen, B. (2013). Outlier detection in regression using an iterated one-step approximation to the Huber-skip estimator. Econometrics 1 53-70.
[25] Johansen, S. and Nielsen, B. (2015). Asymptotic theory of M-estimators in linear time series regression models. Discussion paper, Univ. Copenhagen.
[26] Johansen, S. and Nielsen, B. (2015). Asymptotic theory of outlier detection algorithms for linear time series regression models. Scand. J. Stat.
[27] Kiefer, J. (1967). On Bahadur’s representation of sample quantiles. Ann. Math. Statist. 38 1323-1342. · Zbl 0158.37005 · doi:10.1214/aoms/1177698690
[28] Koul, H.L. and Ossiander, M. (1994). Weak convergence of randomly weighted dependent residual empiricals with applications to autoregression. Ann. Statist. 22 540-562. · Zbl 0836.62063 · doi:10.1214/aos/1176325383
[29] Lee, S. and Wei, C.-Z. (1999). On residual empirical processes of stochastic regression models with applications to time series. Ann. Statist. 27 237-261. · Zbl 0943.62092 · doi:10.1214/aos/1018031109
[30] Nielsen, B. (2014). ForwardSearch. R package version 1. Available at .
[31] R Development Core Team (2011). R: A Language and Environment for Statistical Computing. R Foundation for Statistical Computing, Vienna, Austria. Available at .
[32] Revuz, D. and Yor, M. (1998). Continuous Martingales and Brownian Motion , 3rd ed. Berlin: Springer. · Zbl 1087.60040
[33] Riani, M. and Atkinson, A.C. (2007). Fast calibrations of the Forward Search for testing multiple outliers in regression. Adv. Data Anal. Classif. 1 123-141. · Zbl 1301.62069 · doi:10.1007/s11634-007-0007-y
[34] Riani, M., Atkinson, A.C. and Cerioli, A. (2009). Finding an unknown number of multivariate outliers. J. R. Stat. Soc. Ser. B. Stat. Methodol. 71 447-466. · Zbl 1248.62091 · doi:10.1111/j.1467-9868.2008.00692.x
[35] Rousseeuw, P.J. (1984). Least median of squares regression. J. Amer. Statist. Assoc. 79 871-880. · Zbl 0547.62046 · doi:10.2307/2288718
[36] Rousseeuw, P.J. and Leroy, A.M. (1987). Robust Regression and Outlier Detection . New York: Wiley. · Zbl 0711.62030
[37] Ruppert, D. and Carroll, R.J. (1980). Trimmed least squares estimation in the linear model. J. Amer. Statist. Assoc. 75 828-838. · Zbl 0459.62055 · doi:10.2307/2287169
[38] Sampford, M.R. (1953). Some inequalities on Mill’s ratio and related functions. Ann. Math. Statist. 24 130-132. · Zbl 0050.13503 · doi:10.1214/aoms/1177729093
[39] Shorack, G.R. (1979). Weak convergence of empirical and quantile processes in sup-norm metrics via KMT-constructions. Stochastic Process. Appl. 9 95-98. · Zbl 0405.60006 · doi:10.1016/0304-4149(79)90042-5
[40] Simpson, D.G., Ruppert, D. and Carroll, R.J. (1992). On one-step GM estimates and stability of inferences in linear regression. J. Amer. Statist. Assoc. 87 439-450. · Zbl 0781.62104 · doi:10.2307/2290275
[41] Soms, A.P. (1976). An asymptotic expansion for the tail area of the \(t\)-distribution. J. Amer. Statist. Assoc. 71 728-730. · Zbl 0362.62021 · doi:10.2307/2285610
[42] Víšek, J.Á. (2006). The least trimmed squares. Part I: Consistency. Kybernetika ( Prague ) 42 1-36. · Zbl 1248.62033
[43] Víšek, J.Á. (2006). The least trimmed squares. Part II: \(\sqrt{n}\)-consistency. Kybernetika ( Prague ) 42 181-202. · Zbl 1248.62034
[44] Víšek, J.Á. (2006). The least trimmed squares. Part III: Asymptotic normality. Kybernetika ( Prague ) 42 203-224. · Zbl 1248.62035
[45] Welsh, A.H. and Ronchetti, E. (2002). A journey in single steps: Robust one-step \(M\)-estimation in linear regression. J. Statist. Plann. Inference 103 287-310. · Zbl 0988.62040 · doi:10.1016/S0378-3758(01)00228-2
This reference list is based on information provided by the publisher or from digital mathematics libraries. Its items are heuristically matched to zbMATH identifiers and may contain data conversion errors. In some cases that data have been complemented/enhanced by data from zbMATH Open. This attempts to reflect the references listed in the original paper as accurately as possible without claiming completeness or a perfect matching.