×

Efficient semiparametric estimation and model selection for multidimensional mixtures. (English) Zbl 1473.62106

Summary: In this paper, we consider nonparametric multidimensional finite mixture models and we are interested in the semiparametric estimation of the population weights. Here, the i.i.d. observations are assumed to have at least three components which are independent given the population. We approximate the semiparametric model by projecting the conditional distributions on step functions associated to some partition. Our first main result is that if we refine the partition slowly enough, the associated sequence of maximum likelihood estimators of the weights is asymptotically efficient, and the posterior distribution of the weights, when using a Bayesian procedure, satisfies a semiparametric Bernstein-von Mises theorem. We then propose a cross-validation like method to select the partition in a finite horizon. Our second main result is that the proposed procedure satisfies an oracle inequality. Numerical experiments on simulated data illustrate our theoretical results.

MSC:

62G05 Nonparametric estimation
62G20 Asymptotic properties of nonparametric inference

Software:

CAPUSHE
PDFBibTeX XMLCite
Full Text: DOI arXiv Euclid

References:

[1] E. S. Allman, C. Matias, and J. A. Rhodes. Identifiability of parameters in latent structure models with many observed variables., Ann. Statist., 37(6A) :3099-3132, 12 2009. · Zbl 1191.62003 · doi:10.1214/09-AOS689
[2] A. Anandkumar, R. Ge, D. Hsu, S. M. Kakade, and M. Telgarsky. Tensor decompositions for learning latent variable models., JMLR, 15 :2773-2832, 2014. · Zbl 1319.62109
[3] T. Ando., Bayesian model selection and statistical modeling. Statistics: Textbooks and Monographs. CRC Press, Boca Raton, FL, 2010. · Zbl 1303.62006
[4] S. Arlot., Contributions to statistical learning theory: estimator selection and change-point detection. Habilitation à diriger des recherches, University Paris Diderot, December 2014. Habilitation à diriger des recherches.
[5] S. Arlot and A. Celisse. A survey of cross-validation procedures for model selection., Stat. Surv., 4:40-79, 2010. · Zbl 1190.62080 · doi:10.1214/09-SS054
[6] P. Barbe and P. Bertail., The Weighted Bootstrap, volume 98 of Lecture Notes in Statistics. Springer, 1995. · Zbl 0826.62030
[7] J.-P. Baudry, C. Maugis, and B. Michel. Slope heuristics: overview and implementation., Stat. Comput., (22):455-470, 2012. · Zbl 1322.62007 · doi:10.1007/s11222-011-9236-1
[8] P. J. Bickel, C. A. J. Klaassen, Y. Ritov, and J. A. Wellner., Efficient and adaptive estimation for semiparametric models. Johns Hopkins Series in the Mathematical Sciences. Johns Hopkins University Press, Baltimore, MD, 1993. · Zbl 0786.62001
[9] P. J. Bickel and B. J. K. Kleijn. The semiparametric Bernstein-von Mises theorem., Ann. Statist., 40(1):206-237, 2012. · Zbl 1246.62081 · doi:10.1214/11-AOS921
[10] S. Bonhomme, K. Jochmans, and J.-M. Robin. Estimating multivariate latent-structure models., Ann. Statist., 44(2):540-563, 2016. · Zbl 1381.62055 · doi:10.1214/15-AOS1376
[11] S. Bonhomme, K. Jochmans, and J.-M. Robin. Non-parametric estimation of finite mixtures from repeated measurements., J. R. Stat. Soc. Ser. B. Stat. Methodol., 78(1):211-229, 2016. · Zbl 1411.62079
[12] S. Boucheron and E. Gassiat. A Bernstein-von Mises theorem for discrete probability distributions., Electron. J. Stat., 3:114-148, 2009. · Zbl 1326.62036 · doi:10.1214/08-EJS262
[13] M. A. Brookhart and M. J. van der Laan. A semiparametric model selection criterion with applications to the marginal structural model., Comput. Statist. Data Anal., 50(2):475-498, 2006. · Zbl 1431.62107 · doi:10.1016/j.csda.2004.08.013
[14] I. Castillo. Semiparametric Bernstein-von Mises theorem and bias, illustrated with Gaussian process priors., Sankhya A, 74(2):194-221, 2012. · Zbl 1281.62087 · doi:10.1007/s13171-012-0008-6
[15] I. Castillo. A semiparametric Bernstein-von Mises theorem for Gaussian process priors., Probab. Theory Related Fields, 152(1-2):53-99, 2012. · Zbl 1232.62054 · doi:10.1007/s00440-010-0316-5
[16] G. Claeskens and N. L. Hjort., Model selection and model averaging, volume 27 of Cambridge Series in Statistical and Probabilistic Mathematics. Cambridge University Press, Cambridge, 2008. · Zbl 1166.62001
[17] P. De Blasi and N. L. Hjort. The Bernstein-von Mises theorem in semiparametric competing risks models., J. Statist. Plann. Inference, 139(7) :2316-2328, 2009. · Zbl 1160.62023 · doi:10.1016/j.jspi.2008.10.018
[18] Y. De Castro, E. Gassiat, and C. Lacour. Minimax adaptive estimation of nonparametric hidden Markov models., JMLR, 17(111), 2016. · Zbl 1419.62209
[19] Y. De Castro, E. Gassiat, and S. Le Corff. Consistent estimation of the filtering and marginal smoothing distributions in nonparametric hidden Markov models., I.E.E.E. Trans. Info. Th., 63(8) :4758-4777, 2017. · Zbl 1372.94362 · doi:10.1109/TIT.2017.2696959
[20] E. Gassiat, D. Pollard, and G. Stoltz. Revisiting the van Trees inequality in the spirit of Hajek and Le Cam., unpublished manuscript, 2013.
[21] E. Gassiat and J. Rousseau. Non parametric finite translation hidden Markov models and extensions., Bernoulli, 22(1):193-212, 2016. · Zbl 1388.62243 · doi:10.3150/14-BEJ631
[22] M. H. Hansen and B. Yu. Model selection and the principle of minimum description length. 96:746-774, 2001. · Zbl 1017.62004 · doi:10.1198/016214501753168398
[23] J. B. Kruskal. Three-way arrays: rank and uniqueness of trilinear decompositions, with application to arithmetic complexity and statistics., Linear Algebra and Appl., 18(2):95-138, 1977. · Zbl 0364.15021 · doi:10.1016/0024-3795(77)90069-6
[24] L. Le Cam and G. Yang., Asymptotics in Statistics. Some Basic Concepts, Second Edition. Springer-Verlag, New-York, 2000. · Zbl 0952.62002
[25] G. Lee and C. Scott. EM algorithms for multivariate Gaussian mixture models with truncated and censored data., Comput. Statist. Data Anal., 56(9) :2816-2829, 2012. · Zbl 1255.62308 · doi:10.1016/j.csda.2012.03.003
[26] P. Massart., Concentration inequalities and model selection, volume 1896 of Lecture Notes in Mathematics. Springer, Berlin, 2007. Lectures from the 33rd Summer School on Probability Theory held in Saint-Flour, July 6-23, 2003, With a foreword by Jean Picard.
[27] B. McNeney and J. A. Wellner. Application of convolution theorems in semiparametric models with non-i.i.d. data., J. Statist. Plann. Inference, 91(2):441-480, 2000. Prague Workshop on Perspectives in Modern Statistical Inference: Parametrics, Semi-parametrics, Non-parametrics (1998). · Zbl 0970.62031 · doi:10.1016/S0378-3758(00)00198-1
[28] J. A. Rhodes. A concise proof of kruskal’s theorem on tensor decomposition., Linear Algebra and Appl., 432(7) :1818-1824, 2010. · Zbl 1187.15028 · doi:10.1016/j.laa.2009.11.033
[29] V. Rivoirard and J. Rousseau. Bernstein-von Mises theorem for linear functionals of the density., Ann. Statist., 40(3) :1489-1523, 2012. · Zbl 1257.62036 · doi:10.1214/12-AOS1004
[30] C. P. Robert., The Bayesian Choice. Springer-Verlag, New York, second edition, 2001. · Zbl 0980.62005
[31] X. Shen. Asymptotic normality of semiparametric and nonparametric posterior distributions., J. Amer. Statist. Assoc., 97(457):222-235, 2002. · Zbl 1073.62517 · doi:10.1198/016214502753479365
[32] E. M. Stein and R. Shakarchi., Real analysis. Princeton Lectures in Analysis, III. Princeton University Press, Princeton, NJ, 2005. Measure theory, integration, and Hilbert spaces. · Zbl 1081.28001
[33] A. W. van der Vaart., Asymptotic statistics, volume 3 of Cambridge Series in Statistical and Probabilistic Mathematics. Cambridge University Press, Cambridge, 1998. · Zbl 0910.62001
[34] A. W. van der Vaart. Semiparametric statistics. In, Lectures on probability theory and statistics (Saint-Flour, 1999), volume 1781 of Lecture Notes in Math., pages 331-457. Springer, Berlin, 2002. · Zbl 1013.62031
[35] E. Vernet. Posterior consistency for nonparametric Hidden Markov Models with finite state space., Electronic Journal of Statistics, 9:717-752, 2015. · Zbl 1309.62143 · doi:10.1214/15-EJS1017
This reference list is based on information provided by the publisher or from digital mathematics libraries. Its items are heuristically matched to zbMATH identifiers and may contain data conversion errors. In some cases that data have been complemented/enhanced by data from zbMATH Open. This attempts to reflect the references listed in the original paper as accurately as possible without claiming completeness or a perfect matching.