×

Challenging the empirical mean and empirical variance: a deviation study. (English. French summary) Zbl 1282.62070

Summary: We present new M-estimators of the mean and variance of real valued random variables, based on PAC-Bayes bounds. We analyze the non-asymptotic minimax properties of the deviations of those estimators for sample distributions having either a bounded variance or a bounded variance and a bounded kurtosis. Under those weak hypotheses, allowing for heavy-tailed distributions, we show that the worst case deviations of the empirical mean are suboptimal. We prove indeed that for any confidence level, there is some M-estimator whose deviations are of the same order as the deviations of the empirical mean of a Gaussian statistical sample, even when the statistical sample is instead heavy-tailed. Experiments reveal that these new estimators perform even better than predicted by our bounds, showing deviation quantile functions uniformly lower at all probability levels than the empirical mean for non-Gaussian sample distributions as simple as the mixture of two Gaussian measures.

MSC:

62G05 Nonparametric estimation
62G35 Nonparametric robustness
PDFBibTeX XMLCite
Full Text: DOI arXiv Euclid

References:

[1] P. Alquier. PAC-Bayesian bounds for randomized empirical risk minimizers. Math. Methods Statist. 17 (2008) 279-304. · Zbl 1260.62038 · doi:10.3103/S1066530708040017
[2] J.-Y. Audibert. A better variance control for PAC-Bayesian classification. Preprint n.905bis, Laboratoire de Probabilités et Modèles Aléatoires, Universités Paris 6 and Paris 7, 2004. Available at .
[3] J.-Y. Audibert and O. Catoni. Robust linear least squares regression. Ann. Statist. 39 (2011) 2766-2794. · Zbl 1231.62126 · doi:10.1214/11-AOS918
[4] J.-Y. Audibert and O. Catoni. Robust linear regression through PAC-Bayesian truncation. Unpublished manuscript, 2010. Available at .
[5] R. Beran. An efficient and robust adaptive estimator of location. Ann. Statist. 6 (1978) 292-313. · Zbl 0378.62051 · doi:10.1214/aos/1176344125
[6] P. J. Bickel. On adaptive estimation. Ann. Statist. 10 (1982) 647-671. · Zbl 0489.62033 · doi:10.1214/aos/1176345863
[7] O. Catoni. Statistical Learning Theory and Stochastic Optimization: École d’Été de Probabilités de Saint-Flour XXXI - 2001. Lecture Notes in Math. 1851 . Springer, Berlin, 2004. · Zbl 1076.93002 · doi:10.1007/b99352
[8] O. Catoni. PAC-Bayesian Supervised Classification: The Thermodynamics of Statistical Learning. IMS Lecture Notes Monogr. Ser. 56 . Institute of Mathematical Statistics, Beachwood, OH, 2007. · Zbl 1277.62015
[9] P. J. Huber. Robust estimation of a location parameter. Ann. Math. Statist. 35 (1964) 73-101. · Zbl 0136.39805 · doi:10.1214/aoms/1177703732
[10] P. J. Huber. Robust Statistics. Wiley Series in Probability and Mathematical Statistics . Wiley-Interscience, New York, 1981. · Zbl 0536.62025
[11] O. Lepski. Asymptotically minimax adaptive estimation I: Upper bounds. Optimally adaptive estimates. Theory Probab. Appl. 36 (1991) 682-697. · Zbl 0776.62039 · doi:10.1137/1136085
[12] D. A. McAllester. PAC-Bayesian model averaging. In Proceedings of the 12th Annual Conference on Computational Learning Theory . Morgan Kaufmann, New York, 1999. · Zbl 0945.68157
[13] D. A. McAllester. Some PAC-Bayesian theorems. Mach. Learn. 37 (1999) 355-363. · Zbl 0945.68157 · doi:10.1023/A:1007618624809
[14] D. A. McAllester. PAC-Bayesian stochastic model selection. Mach. Learn. 51 (2003) 5-21. · Zbl 1056.68122 · doi:10.1023/A:1021840411064
[15] C. J. Stone. Adaptive maximum likelihood estimators of a location parameter. Ann. Statist. 3 (1975) 267-284. · Zbl 0303.62026 · doi:10.1214/aos/1176343056
This reference list is based on information provided by the publisher or from digital mathematics libraries. Its items are heuristically matched to zbMATH identifiers and may contain data conversion errors. In some cases that data have been complemented/enhanced by data from zbMATH Open. This attempts to reflect the references listed in the original paper as accurately as possible without claiming completeness or a perfect matching.