×

Posterior contraction of the population polytope in finite admixture models. (English) Zbl 1368.62288

Summary: We study the posterior contraction behavior of the latent population structure that arises in admixture models as the amount of data increases. We adopt the geometric view of admixture models – alternatively known as topic models – as a data generating mechanism for points randomly sampled from the interior of a (convex) population polytope, whose extreme points correspond to the population structure variables of interest. Rates of posterior contraction are established with respect to Hausdorff metric and a minimum matching Euclidean metric defined on polytopes. Tools developed include posterior asymptotics of hierarchical models and arguments from convex geometry.

MSC:

62P10 Applications of statistics to biology and medical sciences; meta analysis
92D10 Genetics and epigenetics
62H30 Classification and discrimination; cluster analysis (statistical aspects)
52A20 Convex sets in \(n\) dimensions (including convex hypersurfaces)

Software:

MixMoGenD; STRUCTURE
PDFBibTeX XMLCite
Full Text: DOI arXiv Euclid

References:

[1] Anandkumar, A., Foster, D., Hsu, D., Kakade, S. and Liu, Y.K. (2012). Two SVDs suffice: Spectral decompositions for probabilistic topic modeling and latent Dirichlet allocation. . arXiv:1204.6703
[2] Arora, S., Ge, R. and Moitra, A. (2012). Learning topic models - going beyond SVD. . arXiv:1204.1956
[3] Barron, A., Schervish, M.J. and Wasserman, L. (1999). The consistency of posterior distributions in nonparametric problems. Ann. Statist. 27 536-561. · Zbl 0980.62039 · doi:10.1214/aos/1018031206
[4] Blei, D.M., Ng, A.Y. and Jordan, M.I. (2003). Latent Dirichlet allocation. J. Mach. Learn. Res. 3 993-1022. · Zbl 1112.68379 · doi:10.1162/jmlr.2003.3.4-5.993
[5] Chen, J.H. (1995). Optimal rate of convergence for finite mixture models. Ann. Statist. 23 221-233. · Zbl 0821.62023 · doi:10.1214/aos/1176324464
[6] Evans, L.C. and Gariepy, R.F. (1992). Measure Theory and Fine Properties of Functions. Studies in Advanced Mathematics . Boca Raton, FL: CRC Press. · Zbl 0804.28001
[7] Ghosal, S., Ghosh, J.K. and van der Vaart, A.W. (2000). Convergence rates of posterior distributions. Ann. Statist. 28 500-531. · Zbl 1105.62315 · doi:10.1214/aos/1016218228
[8] Ghosal, S. and van der Vaart, A. (2007). Convergence rates of posterior distributions for non-i.i.d. observations. Ann. Statist. 35 192-223. · Zbl 1114.62060 · doi:10.1214/009053606000001172
[9] Ghosh, J.K. and Ramamoorthi, R.V. (2003). Bayesian Nonparametrics. Springer Series in Statistics . New York: Springer. · Zbl 1029.62004
[10] Ishwaran, H., James, L.F. and Sun, J. (2001). Bayesian model selection in finite mixtures by marginal density decompositions. J. Amer. Statist. Assoc. 96 1316-1332. · Zbl 1051.62027 · doi:10.1198/016214501753382255
[11] Le Cam, L. (1986). Asymptotic Methods in Statistical Decision Theory. Springer Series in Statistics . New York: Springer. · Zbl 0605.62002
[12] Nguyen, X. (2013). Convergence of latent mixing measures in finite and infinite mixture models. Ann. Statist. 41 370-400. · Zbl 1347.62117 · doi:10.1214/12-AOS1065
[13] Pritchard, J., Stephens, M. and Donnelly, P. (2000). Inference of population structure using multilocus genotype data. Genetics 155 945-959.
[14] Rousseau, J. and Mengersen, K. (2011). Asymptotic behaviour of the posterior distribution in overfitted mixture models. J. R. Stat. Soc. Ser. B Stat. Methodol. 73 689-710. · Zbl 1228.62034 · doi:10.1111/j.1467-9868.2011.00781.x
[15] Schneider, R. (1993). Convex Bodies : The Brunn-Minkowski Theory. Encyclopedia of Mathematics and Its Applications 44 . Cambridge: Cambridge Univ. Press. · Zbl 0798.52001
[16] Shen, X. and Wasserman, L. (2001). Rates of convergence of posterior distributions. Ann. Statist. 29 687-714. · Zbl 1041.62022 · doi:10.1214/aos/1009210686
[17] Toussile, W. and Gassiat, E. (2009). Variable selection in model-based clustering using multilocus genotype data. Adv. Data Anal. Classif. 3 109-134. · Zbl 1284.62397 · doi:10.1007/s11634-009-0043-x
[18] Villani, C. (2009). Optimal Transport : Old and New. Grundlehren der Mathematischen Wissenschaften [ Fundamental Principles of Mathematical Sciences ] 338 . Berlin: Springer. · Zbl 1156.53003
[19] Walker, S. (2004). New approaches to Bayesian consistency. Ann. Statist. 32 2028-2043. · Zbl 1056.62040 · doi:10.1214/009053604000000409
[20] Walker, S.G., Lijoi, A. and Prünster, I. (2007). On rates of convergence for posterior distributions in infinite-dimensional models. Ann. Statist. 35 738-746. · Zbl 1117.62047 · doi:10.1214/009053606000001361
[21] Wong, W.H. and Shen, X. (1995). Probability inequalities for likelihood ratios and convergence rates of sieve MLEs. Ann. Statist. 23 339-362. · Zbl 0829.62002 · doi:10.1214/aos/1176324524
[22] Yu, B. (1997). Assouad, Fano, and Le Cam. In Festschrift for Lucien Le Cam 423-435. New York: Springer. · Zbl 0896.62032 · doi:10.1007/978-1-4612-1880-7_29
This reference list is based on information provided by the publisher or from digital mathematics libraries. Its items are heuristically matched to zbMATH identifiers and may contain data conversion errors. In some cases that data have been complemented/enhanced by data from zbMATH Open. This attempts to reflect the references listed in the original paper as accurately as possible without claiming completeness or a perfect matching.