×

Bayesian analysis for mixtures of discrete distributions with a non-parametric component. (English) Zbl 1514.62393

Summary: Bayesian finite mixture modelling is a flexible parametric modelling approach for classification and density fitting. Many areas of application require distinguishing a signal from a noise component. In practice, it is often difficult to justify a specific distribution for the signal component; therefore, the signal distribution is usually further modelled via a mixture of distributions. However, modelling the signal as a mixture of distributions is computationally non-trivial due to the difficulties in justifying the exact number of components to be used and due to the label switching problem. This paper proposes the use of a non-parametric distribution to model the signal component. We consider the case of discrete data and show how this new methodology leads to more accurate parameter estimation and smaller false non-discovery rate. Moreover, it does not incur the label switching problem. We show an application of the method to data generated by ChIP-sequencing experiments.

MSC:

62-XX Statistics

Software:

BayesPeak; HPeak
PDFBibTeX XMLCite
Full Text: DOI Link

References:

[1] C.E. Antoniak, Mixtures of Dirichlet processes with applications to Bayesian nonparametric problems, Ann. Stat. 2(6) (1974), pp. 1152-1174. doi: 10.1214/aos/1176342871 · Zbl 0335.60034 · doi:10.1214/aos/1176342871
[2] Y. Bao, V. Vinciotti, E. Wit, and P.A.C. ’t Hoen, Accounting for immunoprecipitation efficiencies in the statistical analysis of ChIP-seq data, BMC Bioinform. 14(2013), Article Number 169. doi: 10.1186/1471-2105-14-169 · doi:10.1186/1471-2105-14-169
[3] Y. Bao, V. Vinciotti, E. Wit, and P.A.C. ’t Hoen, Joint modelling of ChIP-seq data via a Markov random field model, Biostatistics 15(2) (2014), pp. 296-310. doi: 10.1093/biostatistics/kxt047 · doi:10.1093/biostatistics/kxt047
[4] G. Celeux, Bayesian inference for mixture: The label switching problem, in COMPSTAT 98, R. Payne and P.J. Green, eds., Physica, Heidelberg, 1998, pp. 227-232. · Zbl 0951.62018
[5] G. Celeux, M. Hurn, and C.P. Robert, Computational and inferential difficulties with mixture posterior distributions, J. Amer. Statist. Assoc. 95(451) (2000), pp. 957-970. doi: 10.1080/01621459.2000.10474285 · Zbl 0999.62020
[6] J. Diebolt and C.P. Robert, Estimation of finite mixture distributions through Bayesian sampling, J. R. Stat. Soc. Ser. B 56(1994), pp. 363-375. · Zbl 0796.62028
[7] J. Ernst and M . Kellis, Discovery and characterization of chromatin states for systematic annotation of the human genome, Nature Biotechnol. 28(8) (2010), pp. 817-825. doi: 10.1038/nbt.1662 · doi:10.1038/nbt.1662
[8] M.D. Escobar and M. West, Bayesian density estimation and inference using mixtures, J. Amer. Statist. Assoc. 90(430) (1995), pp. 577-588. doi: 10.1080/01621459.1995.10476550 · Zbl 0826.62021
[9] T.S. Ferguson, A Bayesian analysis of some nonparametric problems, Ann. Statist. 1(1973), pp. 209-230. doi: 10.1214/aos/1176342360 · Zbl 0255.62037 · doi:10.1214/aos/1176342360
[10] P.J. Green, Reversible jump Markov chain Monte Carlo computation and Bayesian model determination, Biometrika 82(4) (1995), pp. 711-732. doi: 10.1093/biomet/82.4.711 · Zbl 0861.62023 · doi:10.1093/biomet/82.4.711
[11] A. Jasra, C.C. Holmes, and D.A. Stephens, Markov chain Monte Carlo methods and the label switching problem in Bayesian mixture modeling, Stat. Sci. 20(2005), pp. 50-67. doi: 10.1214/088342305000000016 · Zbl 1100.62032 · doi:10.1214/088342305000000016
[12] P.F. Kuan, D. Chung, G. Pan, J.A. Thomson, R. Stewart, and S. Kele, A statistical framework for the analysis of ChIP-seq data, J. Amer. Stat. Assoc. 106(495) (2011), pp. 891-903. doi: 10.1198/jasa.2011.ap09706 · Zbl 1229.62146
[13] G. McLachlan and D. Peel, Finite Mixture Models, Wiley.com, New York, 2004. · Zbl 1140.92010 · doi:10.1002/047172842X
[14] A. Nobile and A.T. Fearnside, Bayesian finite mixtures with an unknown number of components: The allocation sampler, Stat. Comput. 17(2) (2007), pp. 147-162. doi: 10.1007/s11222-006-9014-7 · doi:10.1007/s11222-006-9014-7
[15] Z.S. Qin, J. Yu, J. Shen, C.A. Maher, M. Hu, S. Kalyana-Sundaram, J. Yu, and A.M. Chinnaiyan, HPeak: An hmm-based algorithm for defining read-enriched regions in ChIP-seq data, BMC Bioinform. 11(1) (2010), p. 369. doi: 10.1186/1471-2105-11-369 · doi:10.1186/1471-2105-11-369
[16] Y.F.M. Ramos, M.S. Hestand, M. Verlaan, E. Krabbendam, Y. Ariyurek, M. van Galen, H. van Dam, G.-J.B. van Ommen, J.T. den Dunnen, A. Zantema, and P.A.C. ’t Hoen, Genome-wide assessment of differential roles for p300 and CBP in transcription regulation, Nucleic Acids Res. 39(16) (2010), pp. 5396-5408. doi: 10.1093/nar/gkq184 · doi:10.1093/nar/gkq184
[17] S. Richardson and P.J. Green, On Bayesian analysis of mixtures with an unknown number of components (with discussion), J. R. Stat. Soc. Ser. B 59(4) (1997), pp. 731-792. doi: 10.1111/1467-9868.00095 · Zbl 0891.62020 · doi:10.1111/1467-9868.00095
[18] C.E. Rodriguez and S.G. Walker, Label switching in bayesian mixture models: Deterministic relabeling strategies, J. Comput. Graph. Statist. 23(2014), pp. 25-45. doi: 10.1080/10618600.2012.735624
[19] S. Song, D.L. Nicolae, and J. Song, Estimating the mixing proportion in a semiparametric mixture model, Comput. Statist. Data Anal. 54(2010), pp. 2276-2283. doi: 10.1016/j.csda.2010.04.007 · Zbl 1284.62395 · doi:10.1016/j.csda.2010.04.007
[20] M. Sperrin, T. Jaki, and E. Wit, Probabilistic relabelling strategies for the label switching problem in Bayesian mixture models, Stat. Comput. 20(2010), pp. 357-366. doi: 10.1007/s11222-009-9129-8 · doi:10.1007/s11222-009-9129-8
[21] C. Spyrou, R. Stark, A.G. Lynch, and S. Tavaré, BayesPeak: Bayesian analysis of ChIP-seq data, BMC Bioinform. 10(1) (2009), p. 299. doi: 10.1186/1471-2105-10-299 · doi:10.1186/1471-2105-10-299
[22] M. Stephens, Bayesian analysis of mixture models with an unknown number of components, an alternative to reversible jump methods, Ann. Statist. 28(2000), pp. 40-74. doi: 10.1214/aos/1016120364 · Zbl 1106.62316 · doi:10.1214/aos/1016120364
[23] M. Stephens, Dealing with label switching in mixture models, J. R. Stat. Soc. Ser. B 62(4) (2000), pp. 795-809. doi: 10.1111/1467-9868.00265 · Zbl 0957.62020 · doi:10.1111/1467-9868.00265
[24] M. West, Hierarchical mixture models in neurological transmission analysis, J. Amer. Statist. Assoc. 92(438) (1997), pp. 587-606. doi: 10.1080/01621459.1997.10474011 · Zbl 0889.62095
[25] S. Xiang, W. Yao, and J. Wu, Minimum profile Hellinger distance estimation for a semiparametric mixture model, Canad. J. Statist. 42(2014), pp. 246-267. doi: 10.1002/cjs.11211 · Zbl 1349.62108 · doi:10.1002/cjs.11211
This reference list is based on information provided by the publisher or from digital mathematics libraries. Its items are heuristically matched to zbMATH identifiers and may contain data conversion errors. In some cases that data have been complemented/enhanced by data from zbMATH Open. This attempts to reflect the references listed in the original paper as accurately as possible without claiming completeness or a perfect matching.