×

Concentration inequalities for cross-validation in scattered data approximation. (English) Zbl 07501641

Summary: Choosing models from a hypothesis space is a frequent task in approximation theory and inverse problems. Cross-validation is a classical tool in the learner’s repertoire to compare the goodness of fit for different reconstruction models. Much work has been dedicated to computing this quantity in a fast manner but tackling its theoretical properties occurs to be difficult. So far, most optimality results are stated in an asymptotic fashion. In this paper we propose a concentration inequality on the difference of cross-validation score and the risk functional with respect to the squared error. This gives a pre-asymptotic bound which holds with high probability. For the assumptions we rely on bounds on the uniform error of the model which allow for a broadly applicable framework.
We support our claims by applying this machinery to Shepard’s model, where we are able to determine precise constants of the concentration inequality. Numerical experiments in combination with fast algorithms indicate the applicability of our results.

MSC:

62G08 Nonparametric regression and quantile regression
41A63 Multidimensional problems
62H15 Hypothesis testing in multivariate analysis
62G05 Nonparametric estimation
41A15 Spline approximation

Software:

NFFT; Matlab; gss
PDFBibTeX XMLCite
Full Text: DOI arXiv

References:

[1] Bakushinskiĭ, A. B., Remarks on the choice of regularization parameter from quasioptimality and relation tests, Zh. Vychisl. Mat. I Mat. Fiz., 24, 8, 1258-1259 (1984) · Zbl 0575.65059
[2] Bartel, F.; Hielscher, R.; Potts, D., Fast cross-validation in harmonic approximation, Appl. Comput. Harmon. Anal., 49, 2, 415-437 (2020) · Zbl 07216378
[3] Bates, S.; Hastie, T.; Tibshirani, R., Cross-validation: what does it estimate and how well does it do it? ArXiv e-prints (2021)
[4] Bauer, F.; Reiß, M., Regularization independent of the noise level: an analysis of quasi-optimality, Inverse Problems, 24, 5, Article 055009 pp. (2008), 16 · Zbl 1147.49024
[5] Becker, S. M.A., Regularization of statistical inverse problems and the Bakushinskiĭ veto, Inverse Problems, 27, 11, Article 115010 pp. (2011), 22 · Zbl 1401.62055
[6] Belytschko, T.; Lu, Y. Y.; Gu, L., Element-free Galerkin methods, Internat. J. Numer. Methods Engrg., 37, 2, 229-256 (1994) · Zbl 0796.73077
[7] Blockeel, H.; Struyf, J., Efficient algorithms for decision tree cross-validation, J. Mach. Learn. Res., 3, 01, 621-650 (2002) · Zbl 1112.68437
[8] Combes, R., An extension of mcdiarmid’s inequality (2015), ArXiv e-prints, abs/1511.05240
[9] De Vito, E.; Pereverzyev, S.; Rosasco, L., Adaptive kernel methods using the balancing principle, Found. Comput. Math., 10, 4, 455-479 (2010) · Zbl 1204.68154
[10] Deshpande, L. N.; Girard, D., Fast computation of cross-validated robust splines and other non-linear smoothing splines, (Curves and Surfaces (1991)), 143-148 · Zbl 0735.41010
[11] Devroye, L., Laws of the iterated logarithm for order statistics of uniform spacings, Ann. Probab., 9, 5 (1981) · Zbl 0465.60038
[12] Fasshauer, G. E., Meshfree Approximation Methods with MATLAB (2007), World Scientific Publishers · Zbl 1123.65001
[13] Fasshauer, G. E.; Zhang, J., Recent results for moving least squares approximation, (Lucian, L.; Neamtu, M., Geometric Modeling and Computing (2003), Nashboro Press: Nashboro Press Brentwood), 163-176 · Zbl 1060.65523
[14] Golub, G. H.; Heath, M.; Wahba, G., Generalized cross-validation as a method for choosing a good ridge parameter, Technometrics, 21, 2, 215-223 (1979) · Zbl 0461.62059
[15] Gordon, Y.; Litvak, A. E.; Pajor, A.; Tomczak-Jaegermann, N., Random \(\varepsilon \)-nets and embeddings in \(l_\infty^N\), Studia Math., 178, 1, 91-98 (2007) · Zbl 1126.46006
[16] Gu, C., Smoothing spline ANOVA models, (volume 297 of Springer Series in Statistics (2013), Springer: Springer New York) · Zbl 1269.62040
[17] Györfi, L.; Kohler, M.; Krzyzak, A.; Walk, H., A distribution-free theory of nonparametric regression, (Springer Series in Statistics (2002), Springer-Verlag: Springer-Verlag New York) · Zbl 1021.62024
[18] Holst, L., On the lengths of the pieces of a stick broken at random, J. Appl. Probab., 17, 3, 623-634 (1980) · Zbl 0435.60016
[19] S. Kale, R. Kumar, S. Vassilvitskii, Cross-validation and mean-square stability, in: Second Symposium on Innovations in Computer Science, ICS2011, 2011, pp. 487-495.
[20] J. Keiner, S. Kunis, D. Potts, NFFT 3.5, C subroutine library. http://www.tu-chemnitz.de/potts/nfft. Contributors: F. Bartel, M. Fenn, T. Görner, M. Kircheis, T. Knopp, M. Quellmalz, M. Schmischke, T. Volkmer, A. Vollrath.
[21] Kindermann, S.; Neubauer, A., On the convergence of the quasioptimality criterion for (iterated) Tikhonov regularization, Inverse Probl. Imaging, 2, 2, 291-299 (2008) · Zbl 1165.47009
[22] Kindermann, S.; Pereverzyev, S.; Pilipenko, A., The quasi-optimality criterion in the linear functional strategy, Inverse Problems, 34, 7, Article 075001 pp. (2018), 24 · Zbl 1415.65131
[23] Krieg, D.; Novak, E.; Sonnleitner, M., Recovery of sobolev functions restricted to iid sampling (2021), ArXiv e-prints
[24] R. Kumar, D. Lokshtanov, S. Vassilvitskii, A. Vattani, Near-optimal bounds for cross-validation via loss stability, in: 30th International Conference on Machine Learning, Vol. 01, ICML 2013, 2013, pp. 27-35.
[25] Kunsch, R. J., Breaking the curse for uniform approximation in hilbert spaces via monte carlo methods, J. Complexity, 48, 15-35 (2018) · Zbl 1473.65009
[26] Lederer, A.; Umlauft, J.; Hirche, S., Uniform error bounds for gaussian process regression with application to safe control (2019), ArXiv e-prints
[27] Li, K.-C., Asymptotic optimality of \(C_L\) and generalized cross-validation in ridge regression with application to spline smoothing, Ann. Statist., 14, 3, 1101-1112 (1986) · Zbl 0629.62043
[28] Lukas, M. A., Robust generalized cross-validation for choosing the regularization parameter, Inverse Problems, 22, 5, 1883-1902 (2006) · Zbl 1104.62032
[29] Lukas, M. A.; de Hoog, F. R.; Anderssen, R. S., Efficient algorithms for robust generalized cross-validation spline smoothing, J. Comput. Appl. Math., 235, 102-107 (2010) · Zbl 1201.65024
[30] McDiarmid, C., On the method of bounded differences, (Volume 141 of London Math. Soc. Lecture Note Ser. Volume 141 of London Math. Soc. Lecture Note Ser, Surveys in combinatorics, 1989 (Norwich, 1989) (1989), Cambridge Univ. Press: Cambridge Univ. Press Cambridge), 148-188 · Zbl 0712.05012
[31] M. Mullin, R. Sukthankar, Complete cross-validation for nearest neighbor classifiers, in: 17th International Conference on Machine Learning, (ICML), 2000.
[32] Nadaraya, E. A., On estimating regression, Theory of Probab. Appl., 9, 141-142 (1964) · Zbl 0136.40902
[33] Nayroles, B.; Touzot, G.; Villon, P., Generalizing the finite element method: diffuse approximation and diffuse elements, Comput. Mech., 10, 5, 307-318 (1992) · Zbl 0764.65068
[34] Pozharska, K.; Ullrich, T., A note on sampling recovery of multivariate functions in the uniform norm (2021), ArXiv e-prints
[35] Rosset, S., Bi-level path following for cross validated solution of kernel quantile regression, J. Mach. Learn. Res., 10, 2473-2505 (2009) · Zbl 1235.68184
[36] Schaefer, S.; McPhail, T.; Warren, J., Image deformation using moving least squares, ACM Trans. Graph., 25, 3, 533-540 (2006)
[37] Shepard, D., A two-dimensional interpolation function for irregularly-spaced data, (Proceedings of the 1968 23rd ACM National Conference. Proceedings of the 1968 23rd ACM National Conference, ACM ’68 (1968), Association for Computing Machinery: Association for Computing Machinery New York, NY, USA), 517-524
[38] Sidje, R. B.; Williams, A. B.; Burrage, K., Fast generalized cross validation using Krylov subspace methods, Numer. Algorithms, 47, 109-131 (2008) · Zbl 1157.65012
[39] Sober, B.; Levin, D., Manifold approximation by moving least-squares projection (MMLS), Constr. Approx., 52, 3, 433-478 (2020) · Zbl 1455.65042
[40] Steinwart, I.; Christmann, A., Support Vector Machines (2008), Springer Publishing Company, Incorporated · Zbl 1203.68171
[41] Tasche, M.; Weyrich, N., Smoothing inversion of Fourier series using generalized cross-validation, Results Math., 29, 1-2, 183-195 (1996) · Zbl 0853.65145
[42] G.S. Watson, Smooth regression analysis, Sankhy = a Ser. A. · Zbl 0137.13002
[43] Weinert, H. L., Efficient computation for Whittaker-Henderson smoothing, Comput. Statist. Data Anal., 52, 959-974 (2007) · Zbl 1452.62139
This reference list is based on information provided by the publisher or from digital mathematics libraries. Its items are heuristically matched to zbMATH identifiers and may contain data conversion errors. In some cases that data have been complemented/enhanced by data from zbMATH Open. This attempts to reflect the references listed in the original paper as accurately as possible without claiming completeness or a perfect matching.