×

zbMATH — the first resource for mathematics

Models as approximations. II. A model-free theory of parametric regression. (English) Zbl 1440.62021
Summary: We develop a model-free theory of general types of parametric regression for i.i.d. observations. The theory replaces the parameters of parametric models with statistical functionals, to be called “regression functionals”, defined on large nonparametric classes of joint \(x\)-\(y\) distributions, without assuming a correct model. Parametric models are reduced to heuristics to suggest plausible objective functions. An example of a regression functional is the vector of slopes of linear equations fitted by OLS to largely arbitrary \(x\)-\(y\) distributions, without assuming a linear model (see Part I). More generally, regression functionals can be defined by minimizing objective functions, solving estimating equations, or with ad hoc constructions. In this framework, it is possible to achieve the following: (1) define a notion of “well-specification” for regression functionals that replaces the notion of correct specification of models, (2) propose a well-specification diagnostic for regression functionals based on reweighting distributions and data, (3) decompose sampling variability of regression functionals into two sources, one due to the conditional response distribution and another due to the regressor distribution interacting with misspecification, both of order \(N^{-1/2}\), (4) exhibit plug-in/sandwich estimators of standard error as limit cases of \(x\)-\(y\) bootstrap estimators, and (5) provide theoretical heuristics to indicate that \(x\)-\(y\) bootstrap standard errors may generally be preferred over sandwich estimators.
For Part I, see [Zbl 1440.62020].

MSC:
62A01 Foundations and philosophical topics in statistics
62J05 Linear regression; mixed models
62F35 Robustness and adaptive procedures (parametric inference)
62P20 Applications of statistics to economics
Software:
bootstrap; R
PDF BibTeX XML Cite
Full Text: DOI Euclid
References:
[1] Berk, R., Kriegler, B. and Ylvisaker, D. (2008). Counting the homeless in Los Angeles county. In Probability and Statistics: Essays in Honor of David A. Freedman (D. Nolan and S. Speed, eds.). Inst. Math. Stat. (IMS) Collect. 2 127-141. IMS, Beachwood, OH. · Zbl 1166.62381
[2] Berk, R., Brown, L., Buja, A., Zhang, K. and Zhao, L. (2013). Valid post-selection inference. Ann. Statist. 41 802-837. · Zbl 1267.62080
[3] Bickel, P. J., Götze, F. and van Zwet, W. R. (1997). Resampling fewer than \(n\) observations: Gains, losses, and remedies for losses. Statist. Sinica 7 1-31. · Zbl 0927.62043
[4] Breiman, L. (1996). Bagging predictors. Mach. Learn. 24 123-140. · Zbl 0858.68080
[5] Buja, A. and Stuetzle, W. (2001,2016). Smoothing effects of bagging: Von Mises expansions of bagged statistical functionals. Available at arXiv:1612.02528.
[6] Buja, A., Brown, L., Kuchibhotla, A. K., Berk, R., George, E. and Zhao, L. (2019). Supplement to “Models as Approximations II: A Model-Free Theory of Parametric Regression.” 10.1214/18-STS694SUPP.
[7] Efron, B. and Tibshirani, R. J. (1993). An Introduction to the Bootstrap. Monographs on Statistics and Applied Probability 57. CRC Press, New York. · Zbl 0835.62038
[8] Gelman, A. and Park, D. K. (2009). Splitting a predictor at the upper quarter or third and the lower quarter or third. Amer. Statist. 63 1-8.
[9] Hall, P. (1992). The Bootstrap and Edgeworth Expansion. Springer Series in Statistics. Springer, New York. · Zbl 0744.62026
[10] Hampel, F. R., Ronchetti, E. M., Rousseeuw, P. J. and Stahel, W. A. (1986). Robust Statistics: The Approach Based on Influence Functions. Wiley Series in Probability and Mathematical Statistics: Probability and Mathematical Statistics. Wiley, New York. · Zbl 0593.62027
[11] Hansen, L. P. (1982). Large sample properties of generalized method of moments estimators. Econometrica 50 1029-1054. · Zbl 0502.62098
[12] Hastie, T. J. and Tibshirani, R. J. (1990). Generalized Additive Models. Monographs on Statistics and Applied Probability 43. CRC Press, London.
[13] Hausman, J. A. (1978). Specification tests in econometrics. Econometrica 46 1251-1271. · Zbl 0397.62043
[14] Huber, P. J. (1964). Robust estimation of a location parameter. Ann. Math. Stat. 35 73-101. · Zbl 0136.39805
[15] Huber, P. J. (1967). The behavior of maximum likelihood estimates under nonstandard conditions. In Proc. Fifth Berkeley Sympos. Math. Statist. and Probability (Berkeley, Calif., 1965/66), Vol. I: Statistics 221-233. Univ. California Press, Berkeley, CA.
[16] Kuchibhotla, A. K., Brown, L. D. and Buja, A. (2018). Model-free study of ordinary least squares linear regression. Available at arXiv:1809.10538.
[17] Liang, K. Y. and Zeger, S. L. (1986). Longitudinal data analysis using generalized linear models. Biometrika 73 13-22. · Zbl 0595.62110
[18] Pearl, J. (2009). Causality: Models, Reasoning, and Inference, 2nd ed. Cambridge Univ. Press, Cambridge. · Zbl 1188.68291
[19] Peters, J., Bühlmann, P. and Meinshausen, N. (2016). Causal inference by using invariant prediction: Identification and confidence intervals. J. R. Stat. Soc. Ser. B. Stat. Methodol. 78 947-1012. · Zbl 1414.62297
[20] Politis, D. N. and Romano, J. P. (1994). Large sample confidence regions based on subsamples under minimal assumptions. Ann. Statist. 22 2031-2050. · Zbl 0828.62044
[21] R Development Core Team (2008). R: A Language and Environment for Statistical Computing. R Foundation for Statistical Computing, Vienna, Austria. Available at http://www.R-project.org.
[22] Rieder, H. (1994). Robust Asymptotic Statistics. Springer Series in Statistics. Springer, New York. · Zbl 0927.62050
[23] Tukey, J. W. (1962). The future of data analysis. Ann. Math. Stat. 33 1-67. · Zbl 0107.36401
[24] White, H. (1980). Using least squares to approximate unknown regression functions. Internat. Econom. Rev. 21 149-170. · Zbl 0444.62119
[25] White, H. · Zbl 0478.62088
This reference list is based on information provided by the publisher or from digital mathematics libraries. Its items are heuristically matched to zbMATH identifiers and may contain data conversion errors. It attempts to reflect the references listed in the original paper as accurately as possible without claiming the completeness or perfect precision of the matching.