×

On general notions of depth for regression. (English) Zbl 07368224

Summary: Depth notions in location have generated tremendous attention in the literature. In fact, data depth and its applications remain as one of the most active research topics in statistics over the last three decades. Most favored notions of depth in location include J. W. Tukey [“Mathematics and the picturing of data”, in: Proceedings of the international congress of mathematicians, Vancouver, B.C., August 21–29, 1974. USA: Canadian Mathematical Congress. 523–531 (1975)] half-space depth (HD), R. Y. Liu [Ann. Stat. 18, No. 1, 405–414 (1990; Zbl 0701.62063)] simplicial depth and projection depth (PD) (W. A. Stahel [Robuste Schätzungen: Infinitesimale Optimalität und Schätzungen von Kovarianzmatrizen. Eidgenössische Technische Hochschule Zürich. 162 S. (1981; Zbl 0531.62036)], and D. L. Donoho [Breakdown properties of multivariate location estimators. Boston: Harvard University (PhD Thesis) (1982)], R. Y. Liu [“Data depth and multivariate rank tests”, in: \(L_1\)-statistical analysis and related methods. Amsterdam: North-Holland. 279–294 (1992)], Y. Zuo and R. Serfling [Ann. Stat. 28, No. 2, 461–482 (2000; Zbl 1106.62334)] and (ZS00) and Y. Zuo [ibid. 31, No. 5, 1460–1490 (2003; Zbl 1046.62056)]), among others. Depth notions in regression have also been proposed sporadically, nevertheless. The regression depth (RD) of P. J. Rousseeuw and M. Hubert [J. Am. Stat. Assoc. 94, No. 446, 388–433 (1999; Zbl 1007.62060)] (RH99), the most famous, exemplifies a direct extension of Tukey HD to regression. Other notions include E. Carrizosa [J. Multivariate Anal. 58, No. 1, 21–26 (1996; Zbl 0865.62036)] and the ones proposed in this article via modifying a functional in R. A. Maronna and V. J. Yohai [Ann. Stat. 21, No. 2, 965–990 (1993; Zbl 0787.62037)] (MY93). Is there any relationship between Carrizosa depth and the RD of RH99? Do these depth notions possess desirable properties? What are the desirable properties? Can existing notions serve well as depth notions in regression? These questions remain open.
The major objectives of the article include (i) revealing the connection between Carrizosa depth and RD of RH99; (ii) expanding location depth evaluating criteria in ZS00 for regression depth notions; (iii) examining the existing regression notions with respect to the gauges; and (iv) proposing the regression counterpart of the eminent location projection depth.

MSC:

62-XX Statistics

Software:

robustbase; StochaTR
PDFBibTeX XMLCite
Full Text: DOI arXiv

References:

[1] Adrover, J. G., Maronna, R. A. and Yohai, V. J. (2002). Relationships between maximum depth and projection regression estimates. J. Statist. Plann. Inference 105 363-375. · Zbl 1026.62027 · doi:10.1016/S0378-3758(01)00264-6
[2] Agostinelli, C. and Romanazzi, M. (2011). Local depth. J. Statist. Plann. Inference 141 817-830. · Zbl 1353.62019 · doi:10.1016/j.jspi.2010.08.001
[3] Bai, Z.-D. and He, X. (1999). Asymptotic distributions of the maximal depth estimators for regression and multivariate location. Ann. Statist. 27 1616-1637. · Zbl 1007.62009 · doi:10.1214/aos/1017939144
[4] Beaton, A. E. and Tukey, J. W. (1974). The fitting of power series, meaning polynomials, illustrated on band-spectroscopic data. Technometrics 16 147-185. · Zbl 0282.62057 · doi:10.1080/00401706.1974.10489171
[5] Caplin, A. and Nalebuff, B. (1988). On 64 · Zbl 0644.90006 · doi:10.2307/1912699
[6] Caplin, A. and Nalebuff, B. (1991a). Aggregation and social choice: A mean voter theorem. Econometrica 59 1-23. · Zbl 0743.90006 · doi:10.2307/2938238
[7] Caplin, A. and Nalebuff, B. (1991b). Aggregation and imperfect competition: On the existence of equilibrium. Econometrica 59 25-59. · Zbl 0738.90012 · doi:10.2307/2938239
[8] Cascos, I. and Molchanov, I. (2007). Multivariate risks and depth-trimmed regions. Finance and Stochastics 11 373-397. · Zbl 1164.91027 · doi:10.1007/s00780-007-0043-7
[9] Carrizosa, E. (1996). A characterization of halfspace depth. J. Multivariate Anal. 58 21-26. · Zbl 0865.62036 · doi:10.1006/jmva.1996.0037
[10] Chakraborty, A. and Chaudhuri, P. (2014). The spatial distribution in infinite dimensional spaces and related quantiles and depths. Ann. Statist. 42 1203-1231. · Zbl 1305.62141 · doi:10.1214/14-AOS1226
[11] Chebana, F. and Ouarda, T. B. M. J. (2008). Depth and homogeneity in regional flood frequency analysis. Water Resour. Res. 44 W11422.
[12] Chebana, F. and Ouarda, T. B. M. J. (2011). Depth-based multivariate descriptive statistics with hydrological applications. J. Geophys. Res. 116 D10120.
[13] Chen, M., Gao, C. and Ren, Z. (2018). Robust covariance and scatter matrix estimation under Huber’s contamination model. Ann. Statist. 46 1932-1960. · Zbl 1408.62104 · doi:10.1214/17-AOS1607
[14] Chernozhukov, V., Galichon, A., Hallin, M. and Henry, M. (2017). Monge-Kantorovich depth, quantiles, ranks and signs. Ann. Statist. 45 223-256. · Zbl 1426.62163 · doi:10.1214/16-AOS1450
[15] Claeskens, G., Hubert, M., Slaets, L. and Vakili, K. (2014). Multivariate functional halfspace depth. J. Amer. Statist. Assoc. 109 411-423. · Zbl 1367.62162 · doi:10.1080/01621459.2013.856795
[16] Cui, X., Lin, L. and Yang, G. (2008). An extended projection data depth and its applications to discrimination. Comm. Statist. Theory Methods 37 2276-2290. · Zbl 1143.62037 · doi:10.1080/03610920701858396
[17] Dang, X. and Serfling, R. (2010). Nonparametric depth-based multivariate outlier identifiers, and masking robustness properties. J. Statist. Plann. Inference 140 198-213. · Zbl 1191.62084 · doi:10.1016/j.jspi.2009.07.004
[18] Donoho, D. L. (1982). Breakdown properties of multivariate location estimators. Ph.D. qualifying paper, Harvard Univ.
[19] Donoho, D. L. and Gasko, M. (1992). Breakdown properties of location estimates based on halfspace depth and projected outlyingness. Ann. Statist. 20 1803-1827. · Zbl 0776.62031 · doi:10.1214/aos/1176348890
[20] Dyckerhoff, R. (2004). Data depths satisfying the projection property. Allg. Stat. Arch. 88 163-190. · Zbl 1294.62112 · doi:10.1007/s101820400167
[21] Eisenhauer, J. G. (2003). Regression through the origin. Teach. Stat. 25 76-80.
[22] Febrero, M., Galeano, P. and González-Manteiga, W. (2008). Outlier detection in functional data by depth measures, with application to identify abnormal \(\text{NO}_x\) levels. Environmetrics 19 331-345.
[23] Ghosh, A. K. and Chaudhuri, P. (2005). On data depth and distribution-free discriminant analysis using separating surfaces. Bernoulli 11 1-27. · Zbl 1059.62064 · doi:10.3150/bj/1110228239
[24] Gijbels, I. and Nagy, S. (2017). On a general definition of depth for functional data. Statist. Sci. 32 630-639. · Zbl 1381.62098 · doi:10.1214/17-STS625
[25] Hallin, M., Paindaveine, D. and Šiman, M. (2010). Multivariate quantiles and multiple-output regression quantiles: From \(L_1\) optimization to halfspace depth. Ann. Statist. 38 635-669. · Zbl 1183.62088 · doi:10.1214/09-AOS723
[26] Hampel, F. R. (1974). The influence curve and its role in robust estimation. J. Amer. Statist. Assoc. 69 383-393. · Zbl 0305.62031 · doi:10.1080/01621459.1974.10482962
[27] Hoberg, R. (2000). Cluster analysis based on data depth. In Data Analysis, Classification and Related Methods (H. Kiers, J. P. Rasson, P. Groenen and M. Schader, eds.) 17-22. Springer, Berlin.
[28] Huber, P. J. (1964). Robust estimation of a location parameter. Ann. Math. Stat. 35 73-101. · Zbl 0136.39805 · doi:10.1214/aoms/1177703732
[29] Huber, P. J. (1973). Robust regression: Asymptotics, conjectures and Monte Carlo. Ann. Statist. 1 799-821. · Zbl 0289.62033 · doi:10.1214/aos/1176342503
[30] Hubert, M., Rousseeuw, P. J. and Segaert, P. (2015). Multivariate functional outlier detection. Stat. Methods Appl. 24 177-202. · Zbl 1441.62124 · doi:10.1007/s10260-015-0297-8
[31] Hubert, M., Rousseeuw, P. J. and Van Aelst, S. (2001). Similarities between location depth and regression depth. In Statistics in Genetics and in the Environmental Sciences (Ascona, 1999) (L. Fernholz, S. Morgenthaler and W. Stahel, eds.). Trends Math. 159-172. Birkhäuser, Basel.
[32] Jaeckel, L. A. (1972). Estimating regression coefficients by minimizing the dispersion of the residuals. Ann. Math. Stat. 43 1449-1458. · Zbl 0277.62049 · doi:10.1214/aoms/1177692377
[33] Jörnsten, R. (2004). Clustering and classification based on the \(L_1\) data depth. J. Multivariate Anal. 90 67-89. · Zbl 1047.62064
[34] Jurecková, J. (1971). Nonparametric estimate of regression coefficients. Ann. Math. Stat. 42 1328-1338. · Zbl 0225.62052
[35] Koenker, R. and Bassett, G. Jr. (1978). Regression quantiles. Econometrica 46 33-50. · Zbl 0373.62038 · doi:10.2307/1913643
[36] Koshevoy, G. and Mosler, K. (1997). Zonoid trimming for multivariate distributions. Ann. Statist. 25 1998-2017. · Zbl 0881.62059 · doi:10.1214/aos/1069362382
[37] Koul, H. L. (1970). Some convergence theorems for ranks and weighted empirical cumulatives. Ann. Math. Stat. 41 1768-1773. · Zbl 0232.62020 · doi:10.1214/aoms/1177696824
[38] Koul, H. L. (1971). Asymptotic behavior of a class of confidence regions based on ranks in regression. Ann. Math. Stat. 42 466-476. · Zbl 0215.54204 · doi:10.1214/aoms/1177693398
[39] Lange, T., Mosler, K. and Mozharovskyi, P. (2014). Fast nonparametric classification based on data depth. Statist. Papers 55 49-69. · Zbl 1283.62128 · doi:10.1007/s00362-012-0488-4
[40] Li, J., Cuesta-Albertos, J. A. and Liu, R. Y. (2012). \(DD\)-classifier: Nonparametric classification procedure based on \(DD\)-plot. J. Amer. Statist. Assoc. 107 737-753. · Zbl 1261.62058 · doi:10.1080/01621459.2012.688462
[41] Li, J. and Liu, R. Y. (2004). New nonparametric tests of multivariate locations and scales using data depth. Statist. Sci. 19 686-696. · Zbl 1100.62564 · doi:10.1214/088342304000000594
[42] Liu, R. Y. (1990). On a notion of data depth based on random simplices. Ann. Statist. 18 405-414. · Zbl 0701.62063 · doi:10.1214/aos/1176347507
[43] Liu, R. Y. (1992). Data depth and multivariate rank tests. In \(L_1 \)-Statistical Analysis and Related Methods (Neuchâtel, 1992) (Y. Dodge, ed.) 279-294. North-Holland, Amsterdam.
[44] Liu, R. Y. (1995). Control charts for multivariate processes. J. Amer. Statist. Assoc. 90 1380-1387. · Zbl 0868.62075 · doi:10.1080/01621459.1995.10476643
[45] Liu, R. Y., Parelius, J. M. and Singh, K. (1999). Multivariate analysis by data depth: Descriptive statistics, graphics and inference. Ann. Statist. 27 783-858. · Zbl 0984.62037
[46] Liu, R. Y. and Singh, K. (1993). A quality index based on data depth and multivariate rank tests. J. Amer. Statist. Assoc. 88 252-260. · Zbl 0772.62031
[47] López-Pintado, S. and Romo, J. (2009). On the concept of depth for functional data. J. Amer. Statist. Assoc. 104 718-734. · Zbl 1388.62139 · doi:10.1198/jasa.2009.0108
[48] Majumdar, S. and Chatterjee, S. (2018). Non-convex penalized multitask regression using data depth-based penalties. Stat 7 e174.
[49] Maronna, R. A., Martin, R. D. and Yohai, V. J. (2006). Robust Statistics: Theory and Methods. Wiley Series in Probability and Statistics. Wiley, Chichester. · Zbl 1094.62040
[50] Maronna, R. A. and Yohai, V. J. (1993). Bias-robust estimates of regression based on projections. Ann. Statist. 21 965-990. · Zbl 0787.62037 · doi:10.1214/aos/1176349160
[51] Mizera, I. (2002). On depth and deep points: A calculus. Ann. Statist. 30 1681-1736. · Zbl 1039.62046 · doi:10.1214/aos/1043351254
[52] Mizera, I. and Müller, C. H. (2004). Location-scale depth. J. Amer. Statist. Assoc. 99 949-989. · Zbl 1071.62032 · doi:10.1198/016214504000001312
[53] Mosler, K. (2002). Multivariate Dispersion, Central Regions and Depth: The Lift Zonoid Approach. Lecture Notes in Statistics 165. Springer, Berlin. · Zbl 1027.62033
[54] Mosler, K. and Bazovkin, P. (2014). Stochastic linear programming with a distortion risk constraint. OR Spectrum 36 949-969. · Zbl 1305.90321 · doi:10.1007/s00291-014-0372-9
[55] Mosler, K. and Hoberg, R. (2006). Data analysis and classification with the zonoid depth. In Data Depth: Robust Multivariate Analysis, Computational Geometry and Applications (R. Liu, R. Serfling and D. Souvaine, eds.). DIMACS Ser. Discrete Math. Theoret. Comput. Sci. 72 49-59. Amer. Math. Soc., Providence, RI.
[56] Nieto-Reyes, A. and Battey, H. (2016). A topologically valid definition of depth for functional data. Statist. Sci. 31 61-79. · Zbl 1436.62720 · doi:10.1214/15-STS532
[57] Paindaveine, D. and Van Bever, G. (2013). From depth to local depth: A focus on centrality. J. Amer. Statist. Assoc. 108 1105-1119. · Zbl 06224990 · doi:10.1080/01621459.2013.813390
[58] Paindaveine, D. and Van Bever, G. (2015). Nonparametrically consistent depth-based classifiers. Bernoulli 21 62-82. · Zbl 1359.62258 · doi:10.3150/13-BEJ561
[59] Paindaveine, D. and Van Bever, G. (2018). Halfspace depths for scatter, concentration and shape matrices. Ann. Statist. 46 3276-3307. · Zbl 1408.62100 · doi:10.1214/17-AOS1658
[60] Pollard, D. (1984). Convergence of Stochastic Processes. Springer Series in Statistics. Springer, New York. · Zbl 0544.60045
[61] Portnoy, S. (2003). Censored regression quantiles. J. Amer. Statist. Assoc. 98 1001-1012. · Zbl 1045.62099 · doi:10.1198/016214503000000954
[62] Portnoy, S. (2012). Nearly root-\(n\) approximation for regression quantile processes. Ann. Statist. 40 1714-1736. · Zbl 1284.62291 · doi:10.1214/12-AOS1021
[63] Rousseeuw, P. J. (1984). Least median of squares regression. J. Amer. Statist. Assoc. 79 871-880. · Zbl 0547.62046 · doi:10.1080/01621459.1984.10477105
[64] Rousseeuw, P. J. and Hubert, M. (1999). Regression depth. J. Amer. Statist. Assoc. 94 388-433. · Zbl 1007.62060 · doi:10.1080/01621459.1999.10474129
[65] Rousseeuw, P. J. and Leroy, A. M. (1987). Robust Regression and Outlier Detection. Wiley Series in Probability and Mathematical Statistics: Applied Probability and Statistics. Wiley, New York. · Zbl 0711.62030
[66] Rousseeuw, P. J. and Struyf, A. (2004). Characterizing angular symmetry and regression symmetry. J. Statist. Plann. Inference 122 161-173. · Zbl 1040.62041 · doi:10.1016/j.jspi.2003.06.015
[67] Ruppert, D. and Carroll, R. J. (1980). Trimmed least squares estimation in the linear model. J. Amer. Statist. Assoc. 75 828-838. · Zbl 0459.62055 · doi:10.2307/2287169
[68] Serfling, R. (2006). Depth functions in nonparametric multivariate inference. In Data Depth: Robust Multivariate Analysis, Computational Geometry and Applications. DIMACS Ser. Discrete Math. Theoret. Comput. Sci. 72 1-16. Amer. Math. Soc., Providence, RI.
[69] Serfling, R. (2019). Perspectives on depth functions on general data spaces, with consideration of the Tukey, projection, spatial, ‘density’, ‘local’, and ‘contour’ depths. Preprint.
[70] Serfling, R. and Wang, S. (2014). General foundations for studying masking and swamping robustness of outlier identifiers. Stat. Methodol. 20 79-90. · Zbl 1486.62143 · doi:10.1016/j.stamet.2013.08.004
[71] Stahel, W. A. (1981). Robuste Schatzungen: Infinitesimale Optimalitiit und Schiitzungen Von Kovarianzmatrizen. Ph.D. dissertation, ETH, Zurich. · Zbl 0531.62036
[72] Tukey, J. W. (1975). Mathematics and the picturing of data. In Proceedings of the International Congress of Mathematicians (Vancouver, B.C., 1974), Vol. 2 (R. D. James, ed.) 523-531. · Zbl 0347.62002
[73] Van Aelst, S. and Rousseeuw, P. J. (2000). Robustness of deepest regression. J. Multivariate Anal. 73 82-106. · Zbl 0955.62029 · doi:10.1006/jmva.1999.1870
[74] Vardi, Y. and Zhang, C.-H. (2000). The multivariate \(L_1\)-median and associated data depth. Proc. Natl. Acad. Sci. USA 97 1423-1426. · Zbl 1054.62067 · doi:10.1073/pnas.97.4.1423
[75] Velasco-Forero, S. and Angulo, J. (2011). Mathematical morphology for vector images using statistical depth. In Mathematical Morphology and Its Applications to Image and Signal Processing (P. Soille, M. Pesaresi and G. K. Ouzounis, eds.) 355-366. Springer, Berlin. · Zbl 1339.68296
[76] Velasco-Forero, S. and Angulo, J. (2012). Random projection depth for multivariate mathematical morphology. IEEE J. Sel. Top. Signal Process. 6 753-763.
[77] Wang, S. and Serfling, R. (2015). On masking and swamping robustness of leading nonparametric outlier identifiers for univariate data. J. Statist. Plann. Inference 162 62-74. · Zbl 1320.62112 · doi:10.1016/j.jspi.2015.02.002
[78] Wang, S. and Serfling, R. (2018). On masking and swamping robustness of leading nonparametric outlier identifiers for multivariate data. J. Multivariate Anal. 166 32-49. · Zbl 1394.62060 · doi:10.1016/j.jmva.2018.02.003
[79] Wu, M. and Zuo, Y. (2008). Trimmed and Winsorized standard deviations based on a scaled deviation. J. Nonparametr. Stat. 20 319-335. · Zbl 1142.62012 · doi:10.1080/10485250802036909
[80] Wu, M. and Zuo, Y. (2009). Trimmed and Winsorized means based on a scaled deviation. J. Statist. Plann. Inference 139 350-365. · Zbl 1149.62047 · doi:10.1016/j.jspi.2008.03.039
[81] Yeh, A. B. and Singh, K. (1997). Balanced confidence regions based on Tukey’s depth and the bootstrap. J. Roy. Statist. Soc. Ser. B 59 639-652. · Zbl 1090.62539 · doi:10.1111/1467-9868.00088
[82] Zuo, Y. (1998). Contributions to the Theory and Applications of Statistical Depth Functions. ProQuest LLC, Ann Arbor, MI. Ph.D. thesis, Univ. Texas at Dallas.
[83] Zuo, Y. (2003). Projection-based depth functions and associated medians. Ann. Statist. 31 1460-1490. · Zbl 1046.62056 · doi:10.1214/aos/1065705115
[84] Zuo, Y. (2004). Robustness of weighted \(L^p\)-depth and \(L^p\)-median. Allg. Stat. Arch. 88 215-234. · Zbl 1294.62116 · doi:10.1007/s101820400169
[85] Zuo, Y. (2009). Data depth trimming procedure outperforms the classical \(t\) (or \(T^2)\) one. J. Probab. Stat.
[86] Zuo, Y. (2010). Is the \(t\) confidence interval \(\overline{X}\pm t_{\alpha}(n-1)s/\sqrt{n}\) optimal? Amer. Statist. 64 170-173.
[87] Zuo, Y. (2018). Robustness of deepest projection regression depth functional. Statist. Papers.
[88] Zuo, Y. (2019a). Asymptotics for the maximum regression depth estimator. Available at arXiv:1809.09896.
[89] Zuo, Y. (2019b). Computation of projection regression depth and its induced median. Available at arXiv:1905.11846. · Zbl 1510.62233
[90] Zuo, Y. and Serfling, R. (2000). General notions of statistical depth function. Ann. Statist. 28 461-482 · Zbl 1106.62334 · doi:10.1214/aos/1016218226
This reference list is based on information provided by the publisher or from digital mathematics libraries. Its items are heuristically matched to zbMATH identifiers and may contain data conversion errors. In some cases that data have been complemented/enhanced by data from zbMATH Open. This attempts to reflect the references listed in the original paper as accurately as possible without claiming completeness or a perfect matching.