×

Density estimation of a unimodal continuous distribution in the presence of outliers. (English) Zbl 1397.62138

Summary: The “forward search” (FS) is a powerful general method for identifying outliers and their effects on the fitted model. The present study investigates the implementation of the FS method to identify outliers in nonparametric univariate density estimation framework, where the training sample is from a unimodal continuous distribution. The performance of this procedure has been illustrated by some simulation studies and real data examples. It is shown that outliers can lead to unsuitable estimation of a density function. When the number of outliers relative to the number of all observations is small, determination and elimination of the outliers can fit a reasonable density function to a dataset. Finally, it is shown that the simple ordering system proposed in this article can be used in some other frameworks.

MSC:

62G07 Density estimation
62G10 Nonparametric hypothesis testing
62J20 Diagnostics, and linear inference and regression
PDFBibTeX XMLCite
Full Text: DOI

References:

[1] Analytical Methods Committee, Robust statistics—how not to reject outliers. part 1. basic concepts, Analyst, 114, 1693-1697, (1989) · doi:10.1039/AN9891401693
[2] Atkinson, AC, Fast very robust methods for the detection of multiple outliers, J Am Stat Assoc, 89, 1329-1339, (1994) · Zbl 0825.62429 · doi:10.1080/01621459.1994.10476872
[3] Atkinson AC, Riani M (2000) Robust diagnostic regression analysis. Springer, New York · Zbl 0964.62063 · doi:10.1007/978-1-4612-1160-0
[4] Atkinson AC, Riani M, Cerioli A (2004) Exploring multivariate data with the forward search. Springer, New York · Zbl 1049.62057 · doi:10.1007/978-0-387-21840-3
[5] Atkinson, AC; Riani, M; Cerioli, A, The forward search: theory and data analysis, J Korean Stat Soc, 39, 117-134, (2010) · Zbl 1294.62149 · doi:10.1016/j.jkss.2010.02.007
[6] Cerioli, A, Multivariate outlier detection with high-breakdown estimators, J Am Stat Assoc, 105, 147-156, (2010) · Zbl 1397.62167 · doi:10.1198/jasa.2009.tm09147
[7] Cerioli, A; Farcomeni, A; Riani, M, Robust distances for outlier-free goodness-of-fit testing, Comput Stat Data Anal, 65, 29-45, (2012) · Zbl 1471.62036 · doi:10.1016/j.csda.2012.03.008
[8] Chernoff, H, Estimation of the mode, Ann Inst Stat Math, 16, 31-41, (1964) · Zbl 0212.21802 · doi:10.1007/BF02868560
[9] Coin, D, Testing normality in the presence of outliers, Stat Methods Appl, 17, 3-12, (2008) · Zbl 1367.62140 · doi:10.1007/s10260-007-0046-8
[10] Battisti, F; Salini, S, Robust analysis of bibliometric data, Stat Methods Appl, 22, 269-283, (2013) · Zbl 1333.62012 · doi:10.1007/s10260-012-0217-0
[11] Debruyne, M; Verdonck, T, Robust kernel principal component analysis and classification, Adv Data Anal Classif, 4, 151-167, (2010) · Zbl 1284.62370 · doi:10.1007/s11634-010-0068-1
[12] Dharmadhikari SW, Joag-Dev K (1988) Unimodality, convexity, and applications. Academic Press, New York · Zbl 0646.62008
[13] Fried, R; Dehling, H, Robust nonparametric tests for the two-sample location problem, Stat Methods Appl, 20, 409-422, (2011) · Zbl 1337.62093 · doi:10.1007/s10260-011-0164-1
[14] Hadi, AS, Identifying multiple outliers in multivariate data, J Roy Stat Soc: Ser B (Methodol), 54, 761-771, (1992)
[15] Hadi, AS; Simonoff, JS, Procedures for the identification of multiple outliers in linear models, J Am Stat Assoc, 88, 1264-1272, (1993) · doi:10.1080/01621459.1993.10476407
[16] Hettich S, Bay SD (1999) The UCI KDD archive. University of California. Department of Information and Computer Science, Irvine. http://kdd.ics.uci.edu
[17] Huber, M; Rousseeuw, PJ; Segaert, P, Multivariate functional outlier detection (with discussion), Stat Methods Appl, 24, 177-277, (2015) · Zbl 1441.62124 · doi:10.1007/s10260-015-0297-8
[18] Jann B (2005) Univariate kernel density estimation. Boston College Department of Economics, Statistical Software Component No. S, 456410
[19] Mahdavi, A; Towhidi, M, Robust tests for testing the parameters of a normal population, J Sci Islam Repub Iran, 25, 273-280, (2014)
[20] Park, BU; Marron, JS, Comparison of data-driven bandwidth selectors, J Am Stat Assoc, 85, 66-72, (1990) · doi:10.1080/01621459.1990.10475307
[21] Parzen, E, On estimation of a probability density function and mode, Ann Math Stat, 33, 1065-1076, (1962) · Zbl 0116.11302 · doi:10.1214/aoms/1177704472
[22] Riani, M; Atkinson, AC, Fast calibrations of the forward search for testing multiple outliers in regression, Adv Data Anal Classif, 1, 123-141, (2007) · Zbl 1301.62069 · doi:10.1007/s11634-007-0007-y
[23] Riani, M; Atkinson, AC; Cerioli, A, Finding an unknown number of multivariate outliers, J R Stat Soc Ser B (Stat Methodol), 71, 447-466, (2009) · Zbl 1248.62091 · doi:10.1111/j.1467-9868.2008.00692.x
[24] Rosenblatt, M, Remarks on some nonparametric estimates of a density function, Ann Math Stat, 27, 832-837, (1956) · Zbl 0073.14602 · doi:10.1214/aoms/1177728190
[25] Rousseeuw, PJ, Least Median of squares regression, J Am Stat Assoc, 79, 871-880, (1984) · Zbl 0547.62046 · doi:10.1080/01621459.1984.10477105
[26] Rousseeuw PJ, Leroy AM (1987) Robust regression and outlier detection. Wiley, New York · Zbl 0711.62030 · doi:10.1002/0471725382
[27] Shapiro, SS; Wilk, MB, An analysis of variance test for normality (complete samples), Biometrika, 52, 591-611, (1965) · Zbl 0134.36501 · doi:10.2307/2333709
[28] Sheather, SJ; Jones, MC, A reliable data-based bandwidth selection method for kernel density estimation, J R Stat Soc Ser B (Methodol), 53, 683-690, (1991) · Zbl 0800.62219
[29] Silverman BW (1986) Density estimation for statistics and data analysis. Chapman & Hall/CRC, London · Zbl 0617.62042 · doi:10.1007/978-1-4899-3324-9
[30] Zambom AZ, Dias R (2012). A review of kernel density estimation with applications to econometrics. arXiv Preprint arXiv:1212.2812
This reference list is based on information provided by the publisher or from digital mathematics libraries. Its items are heuristically matched to zbMATH identifiers and may contain data conversion errors. In some cases that data have been complemented/enhanced by data from zbMATH Open. This attempts to reflect the references listed in the original paper as accurately as possible without claiming completeness or a perfect matching.