×

Exponential inequalities for nonstationary Markov chains. (English) Zbl 1434.60171

Summary: Exponential inequalities are main tools in machine learning theory. To prove exponential inequalities for non i.i.d random variables allows to extend many learning techniques to these variables. Indeed, much work has been done both on inequalities and learning theory for time series, in the past 15 years. However, for the non independent case, almost all the results concern stationary time series. This excludes many important applications: for example any series with a periodic behaviour is nonstationary. In this paper, we extend the basic tools of [J. Dedecker and X. Fan, Stochastic Processes Appl. 125, No. 1, 60–90 (2015; Zbl 1301.60055)] to nonstationary Markov chains. As an application, we provide a Bernsteintype inequality, and we deduce risk bounds for the prediction of periodic autoregressive processes with an unknown period.

MSC:

60J05 Discrete-time Markov processes on general state spaces
60E15 Inequalities; stochastic orderings
62M05 Markov processes: estimation; hidden Markov models
62M10 Time series, auto-correlation, regression, etc. in statistics (GARCH)
62M20 Inference from stochastic processes and prediction
68Q32 Computational learning theory

Citations:

Zbl 1301.60055

Software:

CAPUSHE
PDFBibTeX XMLCite
Full Text: DOI arXiv

References:

[1] Adamczak, R. (2008). A tail inequality for suprema of unbounded empirical processes with applications to Markov chains. Electron. J. Probab. 13(34), 1000-1034. · Zbl 1190.60010
[2] Alquier, P. and B. Guedj (2018). Simpler PAC-Bayesian bounds for hostile data. Mach. Learn. 107(5), 887-902. · Zbl 1464.62238
[3] Alquier, P., X. Li, and O. Wintenberger (2013). Prediction of time series by statistical learning: general losses and fast rates. Depend. Model. 1, 65-93. · Zbl 06297673
[4] Alquier, P. and O. Wintenberger (2012). Model selection for weakly dependent time series forecasting. Bernoulli 18(3), 883-913. · Zbl 1243.62117
[5] Bardet, J.-M. and P. Doukhan (2018). Non-parametric estimation of time varying AR(1) processes with local stationarity and periodicity. Electron. J. Stat. 12(2), 2323-2354. · Zbl 1403.62155
[6] Baudry, J.-P., C.Maugis, and B. Michel (2012). Slope heuristics: overview and implementation. Stat. Comput. 22(2), 455-470. · Zbl 1322.62007
[7] Bercu, B., B. Delyon, and E. Rio (2015). Concentration Inequalities for Sums and Martingales. Springer, Cham. · Zbl 1337.60002
[8] Bertail, P. and G. Ciołek (2019). New Bernstein and Hoeffding type inequalities for regenerative Markov chains. ALEA Lat. Am. J. Probab. Math. Stat. 16(1), 259-277. · Zbl 1411.62112
[9] Bertail, P. and S. Clémençon (2010). Sharp bounds for the tails of functionals of Markov chains. Theory Probab. Appl. 54(3), 505-515. · Zbl 1211.60028
[10] Bertail, P. and F. Portier (2019). Rademacher complexity forMarkov chains: Applications to kernel smoothing and Metropolis- Hasting. Bernoulli, to appear. Available at http://www.bernoulli-society.org/index.php/publications/bernoulli-journal/bernoulli-journal-papers. · Zbl 1431.60080
[11] Birgé, L. and P. Massart (2007). Minimal penalties for Gaussian model selection. Probab. Theory Related Fields 138(1-2), 33-73. · Zbl 1112.62082
[12] Blanchard, G. and O. Zadorozhnyi (2017). Concentration of weakly dependent Banach-valued sums and applications to kernel learning methods. Available at https://arxiv.org/abs/1712.01934v1. · Zbl 1428.62185
[13] Boucheron, S., G. Lugosi, and P. Massart (2013). Concentration Inequalities: A Nonasymptotic Theory of Independence. Oxford University Press. · Zbl 1279.60005
[14] Catoni, O. (2003). Laplace transform estimates and deviation inequalities. Ann. Inst. Henri Poincaré Probab. Stat. 39(1), 1-26. · Zbl 1012.60023
[15] Collet, P., S. Martinez, and B. Schmitt (2002). Exponential inequalities for dynamical measures of expanding maps of the interval. Probab. Theory Related Fields 123(3), 301-322. · Zbl 1087.37028
[16] Dahlhaus, R. (1996). On the Kullback-Leibler information divergence of locally stationary processes. Stochastic Process. Appl. 62(1), 139-168. · Zbl 0849.60032
[17] de la Peña, V. H. (1999). A general class of exponential inequalities formartingales and ratios. Ann. Probab. 27(1), 537-564. · Zbl 0942.60004
[18] Dedecker, J., P. Doukhan, G. Lang, J. R. León, S. Louhichi, and C. Prieur (2007). Weak Dependence: With Examples and Applications. Springer, New York. · Zbl 1165.62001
[19] Dedecker, J. and X. Fan (2015). Deviation inequalities for separately Lipschitz functionals of iterated random functions. Stochastic Process. Appl. 125(1), 60-90. · Zbl 1301.60055
[20] Diaconis, P. and D. Freedman (1999). Iterated random functions. SIAM Rev. 41(1), 45-76. · Zbl 0926.60056
[21] Doukhan, P. (2018). Stochastic Models for Time Series. Springer, Cham. · Zbl 1401.62007
[22] Doukhan, P. and M. H. Neumann (2007). Probability and moment inequalities for sums of weakly dependent random variables, with applications. Stochastic Process. Appl. 117(7), 878-903. · Zbl 1117.60018
[23] Fan, J., B. Jiang, and Q. Sun (2018). Hoeffding’s lemma forMarkov chains and its applications to statistical learning. Available at https://arxiv.org/abs/1802.00211.
[24] Hang, H. and I. Steinwart (2014). Fast learning from α-mixing observations. J. Multivariate Anal. 127, 184-199. · Zbl 1359.62242
[25] Hang, H. and I. Steinwart (2017). A Bernstein-type inequality for some mixing processes and dynamical systems with an application to learning. Ann. Statist. 45(2), 708-743. · Zbl 1388.60060
[26] Joulin, A. and Y. Ollivier (2010). Curvature, concentration and error estimates for Markov chain Monte Carlo. Ann. Probab. 38(6), 2418-2442. · Zbl 1207.65006
[27] Kontorovich, L. A. and K. Ramanan (2008). Concentration inequalities for dependent random variables via the martingale method. Ann. Probab. 36(6), 2126-2158. · Zbl 1154.60310
[28] Kuznetsov, V. and M. Mohri (2015). Learning theory and algorithms for forecasting non-stationary time series. In C. Cortes, N. D. Lawrence, D. D. Lee, M. Sugiyama, and R. Garnett (Eds.), Advances in Neural Information Processing Systems, pp. 541-549.
[29] Lerasle, M. (2011). Optimal model selection for density estimation of stationary data under various mixing conditions. Ann. Statist. 39(4), 1852-1877. · Zbl 1227.62018
[30] London, B., B. Huang, B. Taskar, and L. Getoor (2014). PAC-Bayesian collective stability. In S. Kaski and J. Corander (Eds.), Artificial Intelligence and Statistics, pp. 585-594.
[31] Massart, P. (2007). Concentration Inequalities and Model Selection. Springer, Berlin. · Zbl 1170.60006
[32] McDonald, D. J., C. R. Shalizi, and M. Schervish (2017). Nonparametric risk bounds for time-series forecasting. J. Mach. Learn. Res. 18, no. 32, 40 pp. · Zbl 1437.62337
[33] Meir, R. (2000). Nonparametric time series prediction through adaptive model selection. Mach. Learn. 39(1), 5-34. · Zbl 0954.68124
[34] Merlevède, F., M. Peligrad, and E. Rio (2009). Bernstein inequality and moderate deviations under strong mixing conditions. In C. Houdré, V. Koltchinskii, D. M. Mason, and M. Peligrad (Eds.), High Dimensional Probability V: The Luminy Volume, pp. 273-292. Institute of Mathematical Statistics, Beachwood OH. · Zbl 1243.60019
[35] Merlevède, F., M. Peligrad, and E. Rio (2011). A Bernstein type inequality and moderate deviations for weakly dependent sequences. Probab. Theory Related Fields 151(3-4), 435-474. · Zbl 1242.60020
[36] Modha, D. S. and E. Masry (1996). Minimum complexity regression estimation with weakly dependent observations. IEEE Trans. Inform. Theory 42(6), 2133-2145. · Zbl 0868.62015
[37] Paulin, D. (2015). Concentration inequalities for Markov chains by Marton couplings and spectral methods. Electron. J. Probab. 20, no. 79, 32 pp. · Zbl 1342.60121
[38] Rio, E. (2013a). Extensions of the Hoeffding-Azuma inequalities. Electron. Commun. Probab. 18, no. 54, 6 pp. · Zbl 1300.60036
[39] Rio, E. (2013b). On McDiarmid’s concentration inequality. Electron. Commun. Probab. 18, no. 44, 11 pp. · Zbl 1348.60042
[40] Rio, E. (2017). Asymptotic Theory of Weakly Dependent Random Processes. Springer, Berlin. · Zbl 1378.60003
[41] Samson, P.-M. (2000). Concentration of measure inequalities for Markov chains and ϕ-mixing processes. Ann. Probab. 28(1), 416-461. · Zbl 1044.60061
[42] Sanchez-Perez, A. (2015). Time series prediction via aggregation: an oracle bound including numerical cost. In A. Antoniadis, J.-M. Poggi, and X. Brossat (Eds.), Modeling and Stochastic Learning for Forecasting in High Dimensions, pp. 243-265. Springer, Cham.
[43] Seldin, Y., F. Laviolette, N. Cesa-Bianchi, J. Shawe-Taylor, and P. Auer (2012). Pac-Bayesian inequalities formartingales. IEEE Trans. Inform. Theory 58(12), 7086-7093. · Zbl 1364.60030
[44] Shalizi, C. and A. Kontorovich (2013). Predictive PAC learning and process decompositions. In C. J. Burges, L. Bottou, M. Welling, Z. Ghahramani, and K. Q. Weinberger (Eds.), Advances in Neural Information Processing Systems, pp. 1619- 1627.
[45] Steinwart, I. and A. Christmann (2009). Fast learning from non-i.i.d. observations. In Y. Bengio, D. Schuurmans, J. D. Lafferty, C. K. I. Williams, and A. Culotta (Eds.), Advances in Neural Information Processing Systems, pp. 1768-1776.
[46] Steinwart, I., D. Hush, and C. Scovel (2009). Learning from dependent observations. J. Multivariate Anal. 100(1), 175-194. · Zbl 1158.68040
[47] van de Geer, S. (1995). Exponential inequalities for martingales, with application to maximum likelihood estimation for counting processes. Ann. Statist. 23(5), 1779-1801. · Zbl 0852.60019
[48] Vapnik, V. N. (1998). Statistical Learning Theory. John Wiley Sons, New York. · Zbl 0935.62007
[49] Wintenberger, O. (2010). Deviation inequalities for sums of weakly dependent time series. Electron. Commun. Probab. 15, 489-503. · Zbl 1225.60034
[50] Wintenberger, O. (2017). Exponential inequalities for unbounded functions of geometrically ergodic Markov chains: applications to quantitative error bounds for regenerative Metropolis algorithms. Statistics 51(1), 222-234. · Zbl 1370.60037
This reference list is based on information provided by the publisher or from digital mathematics libraries. Its items are heuristically matched to zbMATH identifiers and may contain data conversion errors. In some cases that data have been complemented/enhanced by data from zbMATH Open. This attempts to reflect the references listed in the original paper as accurately as possible without claiming completeness or a perfect matching.