×

Fixed-size determinantal point processes sampling for species phylogeny. (English) Zbl 1464.62479

Summary: Determinantal point processes (DPPs) are popular tools that supply useful information for repulsiveness. They provide coherent probabilistic models when negative correlations arise and also represent new algorithms for inference problems like sampling, marginalization and conditioning. Recently, DPPs have played an increasingly important role in machine learning and statistics, since they are used for diverse subset selection problems. In this paper we use \(k\)-DPP, a conditional DPP that models only sets of cardinality \(k\), to sample a diverse subset of species from a large phylogenetic tree. The tree sampling task is important in many studies in modern bioinformatics. The results show a fast mixing sampler for \(k\)-DPP, for which a polynomial bound on the mixing time is given. This approach is applied to a real-world dataset of species, and we observe that leaves joined by a higher subtree are more likely to appear.

MSC:

62P10 Applications of statistics to biology and medical sciences; meta analysis
92D15 Problems related to evolution
62H22 Probabilistic graphical models
60G55 Point processes (e.g., Poisson, Cox, Hawkes processes)

Software:

UniProt
PDFBibTeX XMLCite
Full Text: DOI

References:

[1] Anari, N.; Gharan, S. O.; Rezaei, A., Monte Carlo Markov chain algorithms for sampling strongly Rayleigh distributions and determinantal point processes, In COLT (2016)
[2] Borcea, J.; Branden, P.; Liggett, T. M., Negative dependence and the geometry of polynomials, J. Am. Math. Soc., 22, 521-567 (2009) · Zbl 1206.62096 · doi:10.1090/S0894-0347-08-00618-8
[3] Çivril, A.; Magdon-Ismail, M., On selecting a maximum volume sub-matrix of a matrix and related problems, Theor. Comput. Sci., 4801-4811 (2009) · Zbl 1181.15002 · doi:10.1016/j.tcs.2009.06.018
[4] Cvetkovic, D. M.; Doob, M.; Sachs, H., Spectra of Graphs (1980), Academic Press Inc. · Zbl 0458.05042
[5] Deshpande, A.; Rademacher, L., Efficient volume sampling for row/column subset selection, FOCS, 1-3, 329-338 (2010)
[6] Doyle, P. G.; Snell, J. L., Random Walks and Electric Networks (1984), Mathematical Association of America · Zbl 0583.60065
[7] Ginibre, J., Statistical ensembles of complex, quaternion,and real matrices, J. Math. Phys., 6, 440-449 (1965) · Zbl 0127.39304 · doi:10.1063/1.1704292
[8] Gobel, F.; Jagers, A., Random walks on graphs., Stochastic Processes Appl., 2, 311-336 (1974) · Zbl 0296.60046 · doi:10.1016/0304-4149(74)90001-5
[9] Kang, B., Fast determinantal point process sampling with application to clustering, NIPS, 2319-2327 (2013)
[10] Kannan, R.; Vempala, S., Spectral algorithms, Foundations and Trends in Theoretical Computer Science, 4, 157-288 (2009) · Zbl 1191.68852 · doi:10.1561/0400000025
[11] Kondor, R. I.; Lafferty, J., Diffusion kernels on graphs and other discrete structures, Proceedings of the 19th International Conference On Machine Learning, 315-322 (2002), Omnipress
[12] Kulesza, A.; Taskar, B., Proceedings of the 28th International Conference on Machine Learning, kDPPs: fixed size determinantal point processes, 1193-1200 (2011), Omnipress
[13] Kulesza, A.; Taskar, B., Determinantal point processes for machine learning, Foundations and Trends in Machine Learning, 5, 2-3, 123-286 (2012) · Zbl 1278.68240 · doi:10.1561/2200000044
[14] Lovász, L., Random walks on graphs: A survey, Combinatorics, 2, 1-46 (1993)
[15] Macchi, O., The Coincidence approach to stochastic point processes, Adv. Appl. Probab., 7, 1, 83-122 (1975) · Zbl 0366.60081 · doi:10.2307/1425855
[16] Mehta, M. L.; Gaudin, M., On the density of eigenvalues of a random matrix, Nuclear Physics, 18, 420-427 (1960) · Zbl 0107.35702 · doi:10.1016/0029-5582(60)90414-4
[17] Merris, R., Laplacian matrices of graphs: A survey, Linear Algebra Appl., 197-198, 143-176 (1994) · Zbl 0802.05053 · doi:10.1016/0024-3795(94)90486-3
[18] Pundir, S.; Martin, M. J.; O’Donovan, C., UniProt Protein Knowledgebase, Methods Mol. Biol, 1558, 41-55 (2017) · doi:10.1007/978-1-4939-6783-4_2
This reference list is based on information provided by the publisher or from digital mathematics libraries. Its items are heuristically matched to zbMATH identifiers and may contain data conversion errors. In some cases that data have been complemented/enhanced by data from zbMATH Open. This attempts to reflect the references listed in the original paper as accurately as possible without claiming completeness or a perfect matching.