Machine learning. A probabilistic perspective.

*(English)*Zbl 1295.68003
Cambridge, MA: MIT Press (ISBN 978-0-262-01802-9/hbk; 978-0-262-30616-4/ebook). xxix, 1067 p. (2012).

Until the publication of this book, there were two main textbooks on statistical machine learning: the book by T. Hastie et al. [The elements of statistical learning. Data mining, inference, and prediction. New York, NY: Springer (2001; Zbl 0973.62007)] and the book by C. M. Bishop [Pattern recognition and machine learning. New York, NY: Springer (2006; Zbl 1107.68072)]. The Hastie book primarily takes a frequentist approach to learning, whereas Bishop’s book provides a more Bayesian view of machine learning. By contrast, this new book by Kevin Murphy gives a much more balanced view on machine learning, treating both Bayesian and frequentist approaches. The book is certainly also the most comprehensive text on statistical learning available to date, making it a very valuable resource for every researcher and practitioner in the field. The mathematical level of the book ramps up slowly compared to alternative books, as a result of which Murphy’s can also be used well for teaching at both the undergraduate and the graduate level. To aid its use in teaching, the book provides a number of exercises at the end of each chapter. Moreover, the book comes with a comprehensive Matlab software library that students can use to experiment with the models presented in the book.

The book starts with relatively basic topics such as the rules of probability and fitting of Gaussian models to data. It then proceeds with an in-depth coverage of both Bayesian and frequentist statistics, which is relatively unbiased (although the author makes it clear his preference is with Bayesian statistics). Next, it presents linear models, after which it proceeds with directed graphical models, mixture models, and the EM algorithm. From there, it continues to linear models with continuous latent variables such as factor analysis and PCA, before turning to sparse models and the lasso. Next, the book proceeds with explanation of kernel-based approaches, Gaussian processes, decision trees, neural nets, and ensemble learning. It then introduces (hidden) Markov models, state space models, and (conditional) Markov random fields. Next, it provides a more fundamental explanation of exact inference for graphical models, covering algorithms such as belief propagation and the junction-tree algorithm. Having covered exact inference, the book proceeds with techniques for approximate inference, covering variational inference in much detail, as well as inference using Monte Carlo inference. The final chapters of the book cover clustering, structure learning in graphical models, latent variable models for discrete data (topic models and restricted Boltzmann machines), and recent developments in deep networks.

The book is set up in such a way that makes it useful for a broad audience: the book covers all the standard methods in detail which is useful for readers who just started learning about machine learning (or readers who want to look up the details of a particular method), but it also covers many recent developments from machine-learning research that are covered in few other machine-learning books, which makes the book also interesting for readers who are already acquainted with the basics of machine learning. Examples of advanced topics covered in the book include conditional random fields and structured SVMs, deep learning, affinity propagation, graphical lasso, sparse coding, and graphical model structure learning. Taken together, I highly recommend this book to anyone with an interest in statistical machine learning.

The book starts with relatively basic topics such as the rules of probability and fitting of Gaussian models to data. It then proceeds with an in-depth coverage of both Bayesian and frequentist statistics, which is relatively unbiased (although the author makes it clear his preference is with Bayesian statistics). Next, it presents linear models, after which it proceeds with directed graphical models, mixture models, and the EM algorithm. From there, it continues to linear models with continuous latent variables such as factor analysis and PCA, before turning to sparse models and the lasso. Next, the book proceeds with explanation of kernel-based approaches, Gaussian processes, decision trees, neural nets, and ensemble learning. It then introduces (hidden) Markov models, state space models, and (conditional) Markov random fields. Next, it provides a more fundamental explanation of exact inference for graphical models, covering algorithms such as belief propagation and the junction-tree algorithm. Having covered exact inference, the book proceeds with techniques for approximate inference, covering variational inference in much detail, as well as inference using Monte Carlo inference. The final chapters of the book cover clustering, structure learning in graphical models, latent variable models for discrete data (topic models and restricted Boltzmann machines), and recent developments in deep networks.

The book is set up in such a way that makes it useful for a broad audience: the book covers all the standard methods in detail which is useful for readers who just started learning about machine learning (or readers who want to look up the details of a particular method), but it also covers many recent developments from machine-learning research that are covered in few other machine-learning books, which makes the book also interesting for readers who are already acquainted with the basics of machine learning. Examples of advanced topics covered in the book include conditional random fields and structured SVMs, deep learning, affinity propagation, graphical lasso, sparse coding, and graphical model structure learning. Taken together, I highly recommend this book to anyone with an interest in statistical machine learning.

Reviewer: Laurens van der Maaten (Delft)

##### MSC:

68-01 | Introductory exposition (textbooks, tutorial papers, etc.) pertaining to computer science |

68T05 | Learning and adaptive systems in artificial intelligence |

68T10 | Pattern recognition, speech recognition |

68T01 | General topics in artificial intelligence |