Analyzing compositional data with R.

*(English)*Zbl 1276.62011
Use R!. Berlin: Springer (ISBN 978-3-642-36808-0/pbk; 978-3-642-36809-7/ebook). xv, 258 p. (2013).

Compositional data are characterised by a simple principle: the sum of all proportions is 100%. Compositional data are often hidden in real life experiments when measurements are presented in different units or some components are overlooked or ignored based on various criteria. This book offers not only the theoretical background to analyse and interpret compositional data, but also the R support and guidance for the compositions package. The book is organised in 7 chapters.

The first chapter initiates the reader to the characteristics of compositional data, such as perceiving the compositions as portions of a total and understanding that the total sum of a composition is irrelevant. The authors also include a brief history of compositional data analysis and review existing software for this type of analysis. The chapter concludes with basic R commands that provide the required minimal background for users with little or no experience with R. In the second chapter the authors review the fundamental concepts of compositional data analysis. The basic definitions are followed by the presentation of sub compositions and the closure operation, of perturbations and amalgamation. The subsection is concluded with few introductory words on missing values and outliers. Next, the principles of compositional analysis are numerated and the relevant graphics that can reveal information about the data are described. The chapter finishes with information on multivariate scales and an in depth description of Aitcheson simplex method. The third chapter reviews the distributions of random compositions focussing on models for continuous distributions and count compositions. In the fourth chapter the descriptive analysis of compositional data is presented. Descriptive statistics, such as compositional mean and variance matrices are followed by methods to explore marginals and projections. The fifth chapter is focused on linear models for compositions and consists of theoretical and hands on R examples for compositions modelled as independent or dependent variables. For each type of models the authors present the visualization of the dependence, the model itself and the estimation of regression parameters, the model checks and analysis of residuals. Chapter 6 presents the multivariate statistics aspects of compositional analysis. The analysis of co-dependence is presented using principal components analysis, cluster and discriminant analyses. Other multivariate techniques such as canonical correlation, logistic regression and geostatistics are also presented. The book concludes with a chapter on how to handle zeros, missing values and outliers, the different types of values and interpretations being discussed in detail. The book is built in an accessible manner for undergraduates and postgraduates alike and offers an all in one overview of the analysis of compositional data in R.

The first chapter initiates the reader to the characteristics of compositional data, such as perceiving the compositions as portions of a total and understanding that the total sum of a composition is irrelevant. The authors also include a brief history of compositional data analysis and review existing software for this type of analysis. The chapter concludes with basic R commands that provide the required minimal background for users with little or no experience with R. In the second chapter the authors review the fundamental concepts of compositional data analysis. The basic definitions are followed by the presentation of sub compositions and the closure operation, of perturbations and amalgamation. The subsection is concluded with few introductory words on missing values and outliers. Next, the principles of compositional analysis are numerated and the relevant graphics that can reveal information about the data are described. The chapter finishes with information on multivariate scales and an in depth description of Aitcheson simplex method. The third chapter reviews the distributions of random compositions focussing on models for continuous distributions and count compositions. In the fourth chapter the descriptive analysis of compositional data is presented. Descriptive statistics, such as compositional mean and variance matrices are followed by methods to explore marginals and projections. The fifth chapter is focused on linear models for compositions and consists of theoretical and hands on R examples for compositions modelled as independent or dependent variables. For each type of models the authors present the visualization of the dependence, the model itself and the estimation of regression parameters, the model checks and analysis of residuals. Chapter 6 presents the multivariate statistics aspects of compositional analysis. The analysis of co-dependence is presented using principal components analysis, cluster and discriminant analyses. Other multivariate techniques such as canonical correlation, logistic regression and geostatistics are also presented. The book concludes with a chapter on how to handle zeros, missing values and outliers, the different types of values and interpretations being discussed in detail. The book is built in an accessible manner for undergraduates and postgraduates alike and offers an all in one overview of the analysis of compositional data in R.

Reviewer: Irina Ioana Mohorianu (Norwich)

##### MSC:

62-07 | Data analysis (statistics) (MSC2010) |

62-04 | Software, source code, etc. for problems pertaining to statistics |

62-01 | Introductory exposition (textbooks, tutorial papers, etc.) pertaining to statistics |

62J12 | Generalized linear models (logistic models) |

62H25 | Factor analysis and principal components; correspondence analysis |

62J05 | Linear regression; mixed models |

62H30 | Classification and discrimination; cluster analysis (statistical aspects) |