×

On the asymptotic behaviour of the variance estimator of a \(U\)-statistic. (English) Zbl 1441.62111

Summary: \(U\)-statistics enjoy good properties such as asymptotic normality, unbiasedness and minimal variance among unbiased estimators. The estimation of their variance is often of interest, for instance to derive asymptotic tests. It is well-known that an unbiased estimator of the variance of a \(U\)-statistic can be formulated explicitly as a \(U\)-statistic itself, but specific dependencies on the sample size make asymptotic statements difficult. Here, we solve the issue by decomposing the variance estimator into a linear combination of \(U\)-statistics with fixed kernel size, consequently obtaining a straightforward statement on the asymptotic distribution. We subsequently demonstrate a central limit theorem for the studentized estimator. We show that it leads to a hypothesis test which compares the error estimates of two prediction algorithms and permits construction of an asymptotically exact confidence interval for the true difference of errors. The test is illustrated by a real data application and a simulation study.

MSC:

62G30 Order statistics; empirical distribution functions
62G20 Asymptotic properties of nonparametric inference
62G15 Nonparametric tolerance and confidence regions
62P10 Applications of statistics to biology and medical sciences; meta analysis

Software:

boot; UCI-ml
PDFBibTeX XMLCite
Full Text: DOI

References:

[1] Afendras, G.; Markatou, M., Optimality of training/test size and resampling effectiveness in cross-validation, J. Statist. Plann. Inference, 199, 286-301 (2019) · Zbl 1421.62065
[2] Arlot, S.; Celisse, A., A survey of cross-validation procedures for model selection, Stat. Surv., 4, 40-79 (2010) · Zbl 1190.62080
[3] Bengio, Y.; Grandvalet, Y., No unbiased estimator of the variance of \(K\)-fold cross-validation, J. Mach. Learn. Res., 5, 1089-1105 (2003) · Zbl 1222.68145
[4] Boulesteix, A.-L.; Strobl, C.; Augustin, T.; Daumer, M., Evaluating microarray-based classifiers: an overview, Cancer Inform., 6, 77 (2008)
[5] Canty, A.; Ripley, B. D., Boot: Bootstrap R (S-Plus) functions (2019), R package version 1.3-24
[6] Celisse, A.; Mary-Huard, T., Exact cross-validation for \(k\) NN: application to passive and active learning in classification, J. Soc. Fr. Stat., 152, 83-97 (2012) · Zbl 1316.62084
[7] Fuchs, M.; Krautenbacher, N., Minimization and estimation of the variance of prediction errors for cross-validation designs, J. Stat. Theory Pract., 10, 420-443 (2016) · Zbl 1420.62137
[8] Hoeffding, W., A class of statistics with asymptotically normal distribution, Ann. Math. Stat., 19, 293-325 (1948) · Zbl 0032.04101
[9] Hoeffding, W., Probability inequalities for sums of bounded random variables, J. Am. Stat. Assoc., 58, 13-30 (1963) · Zbl 0127.10602
[10] Jiang, W.; Varma, S.; Simon, R., Calculating confidence intervals for prediction error in microarray classification using resampling, Stat. Appl. Genet. Biol., 7, 21 (2008) · Zbl 1276.62076
[11] Kohavi, R., A study of cross-validation and bootstrap for accuracy estimation and model selection, (International Joint Conferences on Artificial Intelligence, Vol. 14 (1995)), 1137-1145
[12] Lee, A., U-Statistics: Theory and Practice (1990), Marcel Dekker: Marcel Dekker New York · Zbl 0771.62001
[13] Lichman, M., UCI Machine Learning Repository (2013), University of California, Irvine, School of Information and Computer Sciences, URL http://archive.ics.uci.edu/ml
[14] Maesono, Y., Asymptotic mean square errors of variance estimators for U-statistics and their Edgeworth expansions, J. Japan Statist. Soc., 28, 1-19 (1998) · Zbl 0986.62015
[15] Peel, T.; Anthoine, S.; Ralaivola, L., Empirical Bernstein inequalities for U-statistics, Adv. Neural Inf. Process. Syst., 23, 1903-1911 (2010)
[16] Rocha Neto, A. R.; Sousa, R.; Barreto, G. A.; Cardoso, J. S., Diagnostic of pathology on the vertebral column with embedded reject option, (Proceedings of the 5th Iberian Conference on Pattern Recognition and Image Analysis, Gran Canaria, Spain. Proceedings of the 5th Iberian Conference on Pattern Recognition and Image Analysis, Gran Canaria, Spain, Lecture Notes on Computer Science, vol. 6669 (2011)), 588-595
[17] Schucany, W.; Bancson, D., Small sample variance estimators for U-statistics, Aust. J. Stat., 31, 417-426 (1989) · Zbl 0707.62062
[18] Shao, J., Linear model selection by cross-validation, J. Am. Stat. Assoc., 88, 486-494 (1993) · Zbl 0773.62051
[19] Wang, Q.; Lindsay, B., Variance estimation of a general U-statistic with application to cross-validation, Statist. Sinica, 1117-1141 (2014) · Zbl 06431823
[20] Van de Wiel, M.; Berkhof, J.; van Wieringen, W., Testing the prediction error difference between two predictors, Biostatistics, 10, 550-560 (2009) · Zbl 1437.62637
This reference list is based on information provided by the publisher or from digital mathematics libraries. Its items are heuristically matched to zbMATH identifiers and may contain data conversion errors. In some cases that data have been complemented/enhanced by data from zbMATH Open. This attempts to reflect the references listed in the original paper as accurately as possible without claiming completeness or a perfect matching.