an:07106567
Zbl 1428.62220
Egozcue, Juan Jos??; Pawlowsky-Glahn, Vera
Compositional data: the sample space and its structure
EN
Test 28, No. 3, 599-638 (2019).
00438494
2019
j
62H12 62H25 62P20 60F05 62G30
simplex; equivalence class; isometric log-ratio coordinates; Euclidean space; Aitchison geometry; principal balances; dendrogram; principal components; biplot; household income; normal distribution on the simplex; logistic-normal; compositional data (CoDa); central limit theorem
Summary: The log-ratio approach to compositional data (CoDa) analysis has now entered a mature phase. The principles and statistical tools introduced by \textit{J. Aitchison} [The statistical analysis of compositional data. Boca Raton, FL: CRC Press (1986; Zbl 0688.62004)] have proven successful in solving a number of applied problems. The algebraic-geometric structure of the sample space, tailored to those principles, was developed at the beginning of the millennium. Two main ideas completed the J. Aitchison's seminal work: the conception of compositions as equivalence classes of proportional vectors, and their representation in the simplex endowed with an interpretable Euclidean structure. These achievements allowed the representation of compositions in meaningful coordinates (preferably Cartesian), as well as orthogonal projections compatible with the Aitchison distance introduced two decades before. These ideas and concepts are reviewed up to the normal distribution on the simplex and the associated central limit theorem. Exploratory tools, specifically designed for CoDa, are also reviewed. To illustrate the adequacy and interpretability of the sample space structure, a new inequality index, based on the Aitchison norm, is proposed. Most concepts are illustrated with an example of mean household gross income per capita in Spain.
Zbl 0688.62004