zbMATH — the first resource for mathematics

Distributional equivalence and subcompositional coherence in the analysis of compositional data, contingency tables and ratio-scale measurements. (English) Zbl 1276.62037
Summary: We consider two fundamental properties in the analysis of two-way tables of positive data: the principle of distributional equivalence, one of the cornerstones of correspondence analysis of contingency tables, and the principle of subcompositional coherence, which forms the basis of compositional data analysis. For an analysis to be subcompositionally coherent, it suffices to analyze the ratios of the data values. A common approach to dimension reduction in compositional data analysis is to perform principal component analysis on the logarithms of ratios, but this method does not obey the principle of distributional equivalence. We show that by introducing weights for the rows and columns, the method achieves this desirable property and can be applied to a wider class of methods. This weighted log-ratio analysis is theoretically equivalent to “spectral mapping”, a multivariate method developed almost 30 years ago for displaying ratio-scale data from biological activity spectra. The close relationship between spectral mapping and correspondence analysis is also explained, as well as their connection with association modeling. The weighted log-ratio methodology is used here to visualize frequency data in linguistics and chemical compositional data in archeology.

62H25 Factor analysis and principal components; correspondence analysis
62H17 Contingency tables
ca; chemCal; LEM; R
Full Text: DOI
[1] AITCHISON, J. (1980), ”Relative Variation Diagrams for Describing Patterns of Variability in Compositional Data,” Mathematical Geology, 22, 487–512. · doi:10.1007/BF00890330
[2] AITCHISON, J. (1983), ”Principal Component Analysis of Compositional Data”, Biometrika, 70, 57–65. · Zbl 0515.62057 · doi:10.1093/biomet/70.1.57
[3] AITCHISON, J. (1986), The Statistical Analysis of Compositional Data, London: Chapman & Hall, reprinted in 2003 by Blackburn Press. · Zbl 0688.62004
[4] AITCHISON, J.(1992), ”On Criteria for Measures of Compositional Difference,” Mathematical Geology, 24, 365–80. · Zbl 0970.86531 · doi:10.1007/BF00891269
[5] AITCHISON, J., BARCELÓ-VIDAL, C., MARTIN-FERNÁNDEZ, J.A., and PAWLOWSKY-GLAHN, V. (2000), ”Logratio Analysis and Compositional Distance,” Mathematical Geology, 32, 271–275. · Zbl 1101.86309 · doi:10.1023/A:1007529726302
[6] AITCHISON, J., and EGOZCUE, J.J. (2005), ”Compositional Data Analysis: Where Are We and Where Should We Be Heading?”, Mathematical Geology, 37, 829–850. · Zbl 1177.86017 · doi:10.1007/s11004-005-7383-7
[7] AITCHISON, J., and GREENACRE, M.J. (2002), ”Biplots of Compositional Data,” Applied Statistics, 51, 375–392. · Zbl 1111.62300
[8] BAXTER, M.J., COOL, H.E.M., and HEYWORTH, M.P. (1990), ”Principal Component and Correspondence Analysis of Compositional Data: Some Similarities,” Journal of Applied Statistics, 17, 229–235. · doi:10.1080/757582834
[9] BAVAUD, F. (2002), ”Quotient Dissimilarities, Euclidean Embeddability, and Huygens’ Weak Principle,” in Classification, Clustering and Data Analysis, eds. K. Jajuga, A. Sokolowski and H.-H.Bock, New York: Springer, pp. 195–202. · Zbl 1033.62055
[10] BAVAUD, F. (2004), ”Generalized Factor Analyses for Contingency Tables,” in Classification, Clustering, and Data Mining Applications, eds. D. Banks, L. House, F.R. McMorris, P. Arabie and W. Gaul, New York: Springer, pp. 597–606.
[11] BEARDAH, C.C., BAXTER, M.J., COOL, H.E.M., and JACKSON, C.M. (2003), ”Compositional Data Analysis of Archaeological Glass: Problems and Possible Solutions,” in: Proceedings of the First Compositional Data Analysis Workshop, Girona, Spain, http://ima.udg.edu/Activitats/CoDaWork03/paper_baxter_Beardah2.pdf
[12] BENZÉCRI, J.-P. (1973), L’Analyse des Données, Tôme I: La Classification, Tôme II: L’Analyse des Correspondances, Paris : Dunod. · Zbl 0297.62038
[13] CUADRAS, C., CUADRAS, D., and GREENACRE, M.J. (2006), ”A Comparison of Methods for Analyzing Contingency Tables,” Communications in Statistics Simulation and Computation, 35, 447–459. · Zbl 1093.62061 · doi:10.1080/03610910600591875
[14] CUADRAS, C., and FORTIANA, J. (1998), ”Visualizing Categorical Data with Related Metric Scaling,” in Visualization of Categorical Data, eds. J. Blasius and M.J. Greenacre, San Diego: Academic Press, pp. 112–129.
[15] EGOZCUE, J.J., and PAWLOWSKY-GLAHN, V. (2005), ”Groups of Parts and Their Balances in Compositional Data Analysis,” Mathematical Geology, 37, 795–828. · Zbl 1177.86018 · doi:10.1007/s11004-005-7381-9
[16] ESCOFIER, B. (1978), ”Analyse factorielle et distances répondant au principe d’équivalence distributionelle,” Revue de Statistique Appliquée, 26, 29–37.
[17] GABRIEL, K.R. (1971), ”The Biplot-graphical Display with Applications to Principal Component Analysis,” Biometrika, 58, 453–467. · Zbl 0228.62034 · doi:10.1093/biomet/58.3.453
[18] GABRIEL, K.R. (1972), ”Analysis of Meteorological Data by Means of Canonical Decomposition and Biplots,” Journal of Applied Meteorology, 11, 1071–1077. · doi:10.1175/1520-0450(1972)011<1071:AOMDBM>2.0.CO;2
[19] GABRIEL, K. R. (2002), ”Goodness of Fit of Biplots and Correspondence Analysis,” Biometrika, 89, 423–436. · Zbl 1019.62059 · doi:10.1093/biomet/89.2.423
[20] GOODMAN, L.A. (1968), ”The Analysis of Cross-classified Data: Independence, Quasiindependence, and Interactions in Contingency Tables, With or Without Missing Entries,” Journal of the American Statistical Association, 63, 1091–1131. · Zbl 0177.46901 · doi:10.2307/2285873
[21] GOODMAN, L.A. (1985), ”The Analysis of Cross-classified Data Having Ordered and/or Unordered Categories: Association Models, Correlation Models, and Asymmetry Models for Contingency Tables With or Without Missing Entries,” The Annals of Statistics, 13, 10–98. · Zbl 0613.62070 · doi:10.1214/aos/1176346576
[22] GREENACRE, M.J. (1984), Theory and Applications of Correspondence Analysis, London: Academic Press. · Zbl 0555.62005
[23] GREENACRE, M.J. (1993), ”Biplots in Correspondence Analysis,” Journal of Applied Statistics, 20, 251–269. · doi:10.1080/02664769300000021
[24] GREENACRE, M.J. (2006), ”Tying Up the Loose Ends in Simple, Multiple and Joint Correspondence Analysis,” Keynote Address, COMPSTAT 2006, in Proceedings in Computational Statistics, eds. A. Rizzi and M. Vichi, Berlin: Springer-Verlag, pp.163–186.
[25] GREENACRE, M.J. (2007), Correspondence Analysis in Practice (2nd Ed.), London: Chapman & Hall / CRC. · Zbl 1198.62061
[26] GREENACRE, M.J. (2008), ”Power Transformations in Correspondence Analysis,” accepted for publication in Computational Statistics and Data Analysis, downloadable at http://www.econ.upf.edu/en/research/onepaper.php?id=1044
[27] GREENACRE, M.J., and BLASIUS, J. (eds) (1994), Correspondence Analysis in the Social Sciences, London: Academic Press.
[28] GREENACRE, M.J., and PARDO, R. (2006), ”Subset Correspondence Analysis: Visualizing Relationships Among a Selected Set of Response Categories from a Questionnaire Survey,” Sociological Methods and Research, 35, 193–218. · doi:10.1177/0049124106290316
[29] KAZMIERCZAK, J.B. (1985), ”Analyse logarithmique: deux exemples d’application,” Revue de Statistique Appliquée, 33, 13–24.
[30] LEBART, L., MORINEAU A., and WARWICK, K. (1984), Multivariate Descriptive Statistical Analysis, New York: Wiley. · Zbl 0658.62069
[31] LEWI, P.J. (1976), ”Spectral Mapping, A Technique for Classifying Biological Activity Profiles of Chemical Compounds,” Arzneimittel Forschung, 26, 1295–1300.
[32] LEWI, P.J. (1980), ”Multivariate Data Analysis in APL,” in Proceedings of APL-80 Conference, ed. G.A. van der Linden, Amsterdam: North-Holland, pp. 267–271.
[33] LEWI, P.J. (1998), ”Analysis of Contingency Tables,” in Handbook of Chemometrics and Qualimetrics: Part B, eds. B.G.M. Vandeginste, D.L. Massart, L.M.C. Buydens, S. de Jong, P.J. Lewi, and J. Smeyers-Verbeke, Amsterdam: Elsevier, pp. 161–206.
[34] MARTÍN-FERNÁNDEZ, J.A., BARCELÓ-VIDAL, C., and PAWLOWSKY-GLAHN, V. (2003), ”Dealing with Zeros and Missing Values in Compositional Data Sets,” Mathematical Geology, 35, 253–278. · Zbl 1302.86027 · doi:10.1023/A:1023866030544
[35] NENADIĆ, O., and GREENACRE, M.J. (2007), ”Correspondence Analysis in R, with Two- and Three-dimensional Graphics: The ca Package,” Journal of Statistical Software 20(3), http://www.jstatsoft.org/v20/i03/ .
[36] R DEVELOPMENT CORE TEAM (2007), ”R: A Language and Environment for Statistical Computing,” R Foundation for Statistical Computing, Vienna, Austria. ISBN 3-900051-07-0, http://www.R-project.org .
[37] S-PLUS, VERSION 7 (2007). Insightful Corporation, Seattle, USA, http://www.insightful.com .
[38] TER BRAAK, C.J.F. (1985), ”Correspondence Analysis of Incidence and Abundance Data: Properties in Terms of a Unimodal Response Model,” Biometrics, 41, 859–873. · doi:10.2307/2530959
[39] VERMUNT, J.K. (1997), ”LEM: A General Program for the Analysis of Categorical Data,” The Netherlands: Department of Methodology and Statistics, Tilburg University.
[40] WOUTERS, L., GÖHLMANN, H.W., BIJNENS, L., KASS, S.U., MOLENBERGHS, G., and LEWI, P.J. (2003), ”Graphical Exploration of Gene Expression Data: A Comparative Study of Three Multivariate Methods,” Biometrics, 59, 1131–1139. · Zbl 1274.62904 · doi:10.1111/j.0006-341X.2003.00130.x
This reference list is based on information provided by the publisher or from digital mathematics libraries. Its items are heuristically matched to zbMATH identifiers and may contain data conversion errors. It attempts to reflect the references listed in the original paper as accurately as possible without claiming the completeness or perfect precision of the matching.