Data features. (English) Zbl 0831.62001

This article attempts to provide a formal framework for a data based inference which explicitly and consistently recognizes the approximate nature of probability models. It is based on the idea that a stochastic model is adequate if samples generated under the model are very much like the sample actually obtained. The formalization is based on the concept of data feature. Examples are given of applying the ideas to different areas of statistics including location-scale models, densities, nonparametric regression, interlaboratory tests, autoregressive processes and the analysis of variance.
The four cornerstones of the approach are direct comparison, approximation, weak topologies and parsimony. The approach is contrasted to that of much of conventional statistics many of whose concepts are pathologically discontinuous with respect to the topology of data analysis and common sense.
Contents: Section 1 elaborates the basic aims of the article. Section 2 contains a light-hearted but seriously meant account of the problems which derive from the contradictions between the strong topology of inference and the weak topology of common sense. Section 3 is concerned with the relationship between models, data and the world. Section 4 presents a formalization of the idea of data features and of the concept of the adequacy of a probability model. Section 5 contains 14 examples analysed from the point of view of data features.


62A01 Foundations and philosophical topics in statistics
62-07 Data analysis (statistics) (MSC2010)


Full Text: DOI


[1] Andrews D. F., Data: A collection of problems from many fields for students and research workers (1985)
[2] Bahadur R. R., Annals of Mathematical Statistics 27 pp 115– (1959)
[3] D. W. Bailey(1953 ), The inheritance of maternal influences on the growth of the rat, Ph.D. Thesis , University of California.
[4] DOI: 10.1080/02664768700000015
[5] Berger J. O., Statistical decision theory and Bayesian analysis, 2. ed. (1985) · Zbl 0572.62008
[6] Berger J. O., The likelihood principle (1984) · Zbl 1060.62500
[7] Bickel P. J., Mathematical statistics: basic ideas and selected topics (1977) · Zbl 0403.62001
[8] DOI: 10.1214/aos/1176343240 · Zbl 0321.62055
[9] DOI: 10.2307/2982063 · Zbl 0471.62036
[10] Box G. E. P., Bayesian inference in statistical analysis (1973) · Zbl 0271.62044
[11] Christensen R., Plane answers to complex question: the theory of linear models (1987) · Zbl 0645.62076
[12] DOI: 10.2307/1267638 · Zbl 0401.62056
[13] DOI: 10.1214/aos/1176349401 · Zbl 0797.62026
[14] P. L. Davies(1994 ), On locally linearizable location and scale functionals, preprint, University of Essen.
[15] DOI: 10.2307/2290763 · Zbl 0797.62025
[16] P. L. Dawes, and L. Dumbgen(1994 ), Non-parametric regression based on residuals, in preparation.
[17] P. L. Davies, and J. Hoorn(1994 ), The statistical evaluation of interlaboratory tests, in preparation.
[18] P. L. Davies, and W. Terbuck(1994 ), Interactions in the two-way analysis of variance, in preparation.
[19] G. Dietel(1993 ), Global location and dispersion functionals, Dissertation, University of Essen, Germany.
[20] DOI: 10.1214/aos/1176351045 · Zbl 0665.62040
[21] Draper D., Journal of the Royal Statistical Society Series A 156 pp 9– (1993) · Zbl 1002.62503
[22] Feller W., An Introduction to probability theory and applications 1, 3. ed. (1968) · Zbl 0155.23101
[23] Freedman D. A., A Festschrift for Erich Lehmann pp 185– (1983)
[24] DOI: 10.2307/2287377 · Zbl 0432.62024
[25] Hardle W., Applied non-parametric regression (1990)
[26] Hadle W., Smoothing techniques with implementation in S (1990)
[27] DOI: 10.2307/1268758 · Zbl 0571.62030
[28] DOI: 10.1214/aos/1176346577 · Zbl 0575.62045
[29] Huber P. J., Robust statistics (1980)
[30] DOI: 10.1007/BF00485696 · Zbl 0367.62005
[31] LeCam L., International Statistical Review 58 pp 153– (1990)
[32] Linhart H., Model selection (1986) · Zbl 0665.62003
[33] Mardia K. V., Multivariate analysis (1979)
[34] DOI: 10.2307/2335939
[35] Muller D. W., Mathematisch-Physikalirche Semesterberichte N.F. 21 pp 164– (1974)
[36] DOI: 10.2307/2290406 · Zbl 0733.62040
[37] Pollard D., Convergence of stochastic processes (1984) · Zbl 0544.60045
[38] Scheffe H., The analysis of variance (1959) · Zbl 0072.36602
[39] Schlittgen R., Einfuhrung in die Statistik 2 (1990)
[40] Silvermann B. W., Journal of the Royal Statistical Society Series B 47 pp 1– (1985)
[41] Silvermann B. W., Density estimation for statistics and data analysis (1986)
[42] Sprott D. A., Canadian Journal of Psychology/Revue Canadienne de Psychologie 32 pp 180– (1978)
[43] J. W. Tukey(1993 ), Issues relevant to an honest account of data-based inference, partially in the light of Laurie Davies’s paper, Princeton University, Princeton, unpublished.
[44] J. W. Tukey(1993 ), Discussion-Davies’s data sets, Princeton University, Princeton, unpublished.
[45] J. W. Tukey(1993 ), How Davies’s data sets might be reasonably approached, Princeton University, Princeton, unpublished.
[46] Schlittgen R., Einfuhrung in die Statisrik 2 (1990)
[47] Silverman B. W., Density estimation for statistics and data analysk (1986)
[48] Weisberg S., Applied linear regression (1980) · Zbl 0529.62054
This reference list is based on information provided by the publisher or from digital mathematics libraries. Its items are heuristically matched to zbMATH identifiers and may contain data conversion errors. It attempts to reflect the references listed in the original paper as accurately as possible without claiming the completeness or perfect precision of the matching.