Bayesian analysis of population structure based on linked molecular information. (English) Zbl 1107.62115

Summary: The Bayesian model-based approach to inferring hidden genetic population structures using multilocus molecular markers has become a popular tool within certain branches of biology. In particular, it has been shown that heterogeneous data arising from genetically dissimilar latent groups of individuals can be effectively modelled using an unsupervised classification formulation. However, most currently employed models ignore potential linkage within the employed molecular information, and can therefore lead to biased inferences under certain circumstances.
Utilizing the general theory of graphical models, we develop a framework that accounts for dependences both within linked molecular marker loci and DNA sequence data. Due to a high level of sequence conservation among eukaryotic species, the latter aspect is particularly relevant for analyzing rapidly evolving microbial species. The advantages of incorporating the dependence due to linkage in the classification models are illustrated by analyses of both simulated data and real samples of Bacillus cereus.


62P10 Applications of statistics to biology and medical sciences; meta analysis
62F15 Bayesian inference
92C40 Biochemistry, molecular biology


Full Text: DOI


[1] Sillanpää, M.J.; Corander, J., Model choice in gene mapping: what and why, Trends genet., 18, 301, (2002)
[2] Schlötterer, C., A microsatellite-based multilocus screen for the identification of local selective sweeps, Genetics, 160, 753, (2002)
[3] Pritchard, J.K.; Stephens, M.; Donnelly, P., Inference of population structure using multilocus genotype data, Genetics, 155, 945, (2000)
[4] Corander, J.; Waldmann, P.; Sillanpää, M.J., Bayesian analysis of genetic differentiation between populations, Genetics, 163, 367, (2003)
[5] Corander, J.; Waldmann, P.; Marttinen, P.; Sillanpää, M.J., BAPS2: enhanced possibilities for the analysis of genetic population structure, Bioinformatics, 20, 2363, (2004)
[6] J. Corander, P. Marttinen, S. Mäntyniemi, Bayesian identification of stock mixtures from molecular marker data, Fish. Bull., in press. · Zbl 1100.62031
[7] Corander, J.; Marttinen, P., Bayesian identification of admixture events using multi-locus molecular markers, Mol. ecol., 15, 2833, (2006)
[8] Sneath, P.; Sokal, R., Numerical taxonomy, (1973), Freeman San Francisco · Zbl 0285.92001
[9] Feil, E.J.; Li, B.C.; Aanensen, D.M.; Hanage, W.P.; Spratt, B.G., Eburst: inferring patterns of evolutionary descent among clusters of related bacterial genotypes from multilocus sequence typing data, J. bacteriol., 186, 1518, (2004)
[10] Bernardo, J.M.; Smith, A.F.M., Bayesian theory, (1994), Wiley Chichester
[11] J. Corander, M. Gyllenberg, T. Koski, Bayesian unsupervised classification framework based on stochastic partitions of data and a parallel search strategy, J. Statist. Comput. Simulat., submitted for publication. · Zbl 1231.62031
[12] Gyllenberg, M.; Koski, T., Bayesian predictiveness, exchangeability and sufficientness in bacterial taxonomy, Math. biosc., 177-178, 161, (2002) · Zbl 1003.62023
[13] Falush, D.; Stephens, M.; Pritchard, J.K., Inference of population structure using multilocus genotype data: linked loci and correlated allele frequencies, Genetics, 164, 1567, (2003)
[14] Robert, C.P.; Casella, G., Monte Carlo statistical methods, (1999), Springer New York · Zbl 0935.62005
[15] URL http://www.mlst.net.
[16] Lauritzen, S.L., Graphical models, (1996), Oxford University Press Oxford · Zbl 0907.62001
[17] Mardia, K.V.; Kent, J.T.; Bibby, J.M., Multivariate analysis, (1979), Academic Press London · Zbl 0432.62029
[18] Corander, J.; Gyllenberg, M.; Koski, T., Bayesian model learning based on a parallel MCMC strategy, Statist. comput., 16, 355, (2006)
[19] Häggström, O., Finite Markov chains and algorithmic applications, (2002), Cambridge University Cambridge · Zbl 0999.60001
[20] Priest, F.G.; Barker, M.; Baillie, L.W.J.; Holmes, E.C.; Maiden, M.C.J., Population structure and evolution of the bacillus cereus group, J. bacteriol., 186, 7959, (2004)
[21] Hubert, L.; Arabie, P., Comparing partitions, J. classif., 2, 193, (1985)
[22] Schafer, J.L., Analysis of incomplete multivariate data, (1997), Chapman and Hall London · Zbl 0997.62510
[23] Sebastiani, P.; Ramoni, M., Bayesian selection of decomposable models with incomplete data, J. am. stat. assoc., 96, 1375, (2001) · Zbl 1073.62510
[24] Felsenstein, J., Inferring phylogenies, (2003), Sinauer Associates Sunderland
[25] Cerquides, J.; De Mántaras, R.L., TAN classifiers based on decomposable distributions, Mach. learn., 59, 323, (2005) · Zbl 1105.68091
This reference list is based on information provided by the publisher or from digital mathematics libraries. Its items are heuristically matched to zbMATH identifiers and may contain data conversion errors. It attempts to reflect the references listed in the original paper as accurately as possible without claiming the completeness or perfect precision of the matching.