zbMATH — the first resource for mathematics

Spatial regression modeling for compositional data with many zeros. (English) Zbl 1303.62085
Summary: Compositional data analysis considers vectors of nonnegative-valued variables subject to a unit-sum constraint. Our interest lies in spatial compositional data, in particular, land use/land cover (LULC) data in the northeastern United States. Here, the observations are vectors providing the proportions of LULC types observed in each 3 km \(\times\) 3 km grid cell, yielding order \(10^{4}\) cells. On the same grid cells, we have an additional compositional dataset supplying forest fragmentation proportions. Potentially useful and available covariates include elevation range, road length, population, median household income, and housing levels.
We propose a spatial regression model that is also able to capture flexible dependence among the components of the observation vectors at each location as well as spatial dependence across the locations of the simplex-restricted measurements. A key issue is the high incidence of observed zero proportions for the LULC dataset, requiring incorporation of local point masses at 0. We build a hierarchical model prescribing a power scaling first stage and using latent variables at the second stage with spatial structure for these variables supplied through a multivariate CAR specification. Analyses for the LULC and forest fragmentation data illustrate the interpretation of the regression coefficients and the benefit of incorporating spatial smoothing.

62P12 Applications of statistics to environmental and related topics
62M30 Inference from spatial processes
62M10 Time series, auto-correlation, regression, etc. in statistics (GARCH)
CODA; compositions; LFT; R; spBayes
Full Text: DOI
[1] Aitchison, J. (1986), The Statistical Analysis of Compositional Data, New York: Chapman and Hall. · Zbl 0688.62004
[2] Aitchison, J., and Egozcue, J. J. (2005), ”Compositional Data Analysis: Where Are We and Where Should We Be Heading?” Mathematical Geology, 37, 829–850. · Zbl 1177.86017 · doi:10.1007/s11004-005-7383-7
[3] Banerjee, S., Carlin, B. P., and Gelfand, A. E. (2004), Hierarchical Modeling and Analysis for Spatial Data, Boca Raton: Chapman and Hall/CRC Press. · Zbl 1053.62105
[4] Besag, J. (1974), ”Spatial Interaction and the Statistical Analysis of Lattice Systems,” Journal of the Royal Statistical Society. Series B. Statistical Methodology, 36, 192–236. · Zbl 0327.60067
[5] Besag, J., Green, P., Higdon, D., and Mengersen, K. (1995), ”Bayesian Computation and Stochastic Systems,” Statistical Science, 10, 3–66. · Zbl 0955.62552 · doi:10.1214/ss/1177010123
[6] Billheimer, D., Cardoso, T., Freeman, E., Guttorp, P., Ko, H.-W., and Silkey, M. (1997), ”Natural Variability of Benthic Species Composition in the Delaware Bay,” Environmental and Ecological Statistics, 4, 95–115. · doi:10.1023/A:1018514226420
[7] Butler, A., and Glasbey, C. (2009), ”Corrigendum: A Latent Gaussian Model for Compositional Data With Zeros,” Journal of the Royal Statistical Society. Series C. Applied Statistics, 58, 141. · doi:10.1111/j.1467-9876.2008.00644.x
[8] Chakraborty, A., Gelfand, A., Wilson, A. M., Latimer, A. M., and Silander, J. A. (2010), ”Modeling Large Scale Species Abundance With Latent Spatial Processes,” The Annals of Applied Statistics, 4, 1403–1429. · Zbl 1202.62168 · doi:10.1214/10-AOAS335
[9] Egozcue, J. J., Pawlowsky-Glahn, V., Mateu-Figueras, G., and Barceló-Vidal, C. (2003), ”Isometric Logratio Transformations for Compositional Data Analysis,” Mathematical Geology, 35, 279–300. · Zbl 1302.86024 · doi:10.1023/A:1023818214614
[10] Fry, J., Fry, T., and McLaren, K. (2000), ”Compositional Data Analysis and Zeros in Micro Data,” Applied Economics, 32, 953–959. · doi:10.1080/000368400322002
[11] Fry, J. A., Coan, M. J., Homer, C. G., Meyer, D. K., and Wickham, J. (2009), ”Completion of the National Land Cover Database (NLCD) 1992–2001 Land Cover Change Retrofit Product,” U.S. Geological Survey Open-File Report 2008–1379, 18 p.
[12] Gelfand, A. E., and Vounatsou, P. (2003), ”Proper Multivariate Conditional Autoregressive Models for Spatial Data Analysis,” Biostatistics, 4, 11–25. · Zbl 1142.62393 · doi:10.1093/biostatistics/4.1.11
[13] Gelfand, A. E., Schmidt, A. M., Banerjee, S., and Sirmans, C. F. (2004), ”Nonstationary Multivariate Process Modelling Through Spatially Varying Coregionalization” (with discussion), Test, 13, 1–50. · Zbl 1069.62074 · doi:10.1007/BF02595775
[14] Gneiting, T., and Raftery, A. E. (2007), ”Strictly Proper Scoring Rules, Prediction, and Estimation,” Journal of the American Statistical Association, 102, 359–378. · Zbl 1284.62093 · doi:10.1198/016214506000001437
[15] Haslett, J., Whiley, M., Bhattacharya, S., Salter-Townshend, M., Wilson, S. P., Allen, J. R. M., Huntley, B., and Mitchell, F. J. G. (2006), ”Bayesian Palaeoclimate Reconstruction,” Journal of the Royal Statistical Society. Series A. Statistics in Society, 169, 395–438. · Zbl 05273913 · doi:10.1111/j.1467-985X.2006.00429.x
[16] Hughes, J., and Haran, M. (2013), ”Dimension Reduction and Alleviation of Confounding for Spatial Generalized Linear Mixed Models,” Journal of the Royal Statistical Society. Series B. Statistical Methodology, 75, 139–159. · doi:10.1111/j.1467-9868.2012.01041.x
[17] Kent, J. T. (1982), ”The Fisher-Bingham Distribution on the Sphere,” Journal of the Royal Statistical Society. Series B. Statistical Methodology, 44, 71–80. · Zbl 0485.62015
[18] Mardia, K. V. (1988), ”Multi-dimensional Multivariate Gaussian Markov Random Fields With Application to Image Processing,” Journal of Multivariate Analysis, 284, 265–284. · Zbl 0637.60065 · doi:10.1016/0047-259X(88)90040-1
[19] Martín-Fernández, J. A., Barcelo-Vidal, C., and Pawlowsky-Glahn, V. (2003), ”Dealing With Zeros and Missing Values in Compositional Data Sets Using Nonparametric Imputation,” Mathematical Geology, 35, 253–278. · Zbl 1302.86027 · doi:10.1023/A:1023866030544
[20] Minnesota Population Center (2004), ”National Historical Geographic Information System: Pre-release Version, 0.1,” University of Minnesota, Minneapolis, MN, available at: http://www.nhgis.org/ .
[21] National Oceanic Atmospheric Administration (2006), ”Coastal Change Analysis Program Land Cover,” available at: http://www.csc.noaa.gov/crs/lca/northeast.html .
[22] Parent, J., and Hurd, J. (2010), ”Landscape Fragmentation Tool (LFT v2.0).” Center for Land Use Education and Research, available at: http://clear.uconn.edu/tools/lft/lft2/index.htm .
[23] Plummer, M., Best, N., Cowles, K., and Vines, K. (2006), ”CODA: Convergence Diagnosis and Output Analysis for MCMC,” R News, 6, 7–11.
[24] R Core Team (2012), R: A Language and Environment for Statistical Computing, R Foundation for Statistical Computing: Vienna. ISBN:3-900051-07-0.
[25] Reich, B. J., Hodges, J. S., and Zadnik, V. (2006), ”Effects of Residual Smoothing on the Posterior of the Fixed Effects in Disease-Mapping Models,” Biometrics, 62, 1197–1206. · Zbl 1114.62124 · doi:10.1111/j.1541-0420.2006.00617.x
[26] Salter-Townshend, M., and Haslett, J. (2006), ”Modelling Zero Inflation of Compositional Data,” in Proceedings of the 21st International Workshop on Statistical Modelling, pp. 448–456.
[27] Scealy, J. L., and Welsh, A. H. (2011), ”Regression for Compositional Data by Using Distributions Defined on the Hypersphere,” Journal of the Royal Statistical Society. Series B. Statistical Methodology, 73, 351–375. · doi:10.1111/j.1467-9868.2010.00766.x
[28] Stephens, M. A. (1982), ”Use of the von Mises Distribution to Analyse Continuous Proportions,” Biometrika, 69, 197–203. · doi:10.1093/biomet/69.1.197
[29] Stewart, C., and Field, C. (2010), ”Managing the Essential Zeros in Quantitative Fatty Acid Signature Analysis,” Journal of Agricultural, Biological, and Environmental Statistics, 16, 45–69. · Zbl 1306.62237 · doi:10.1007/s13253-010-0040-8
[30] Tjelmeland, H., and Lund, K. V. (2003), ”Bayesian Modelling of Spatial Compositional Data,” Journal of Applied Statistics, 30, 87–100. · Zbl 1121.62497 · doi:10.1080/0266476022000018547
[31] Tsagris, M. T., Preston, S., and Wood, A.T. (2011), ”A Data-Based Power Transformation for Compositional Data,” in Proceedings of CoDaWork: 4th International Workshop on Compositional Data Analysis, eds. J. Egozcue, R. Tolosana-Delgado, and M. Ortego.
[32] Unger, D. A. (1985), ”A Method to Estimate the Continuous Ranked Probability Score,” in Preprints of the Ninth Conference on Probability and Statistics in Atmospheric Sciences, Virginia Beach, Virginia, Boston: American Meteorological Society, pp. 206–213.
[33] U.S. Census Bureau (2008), ”TIGER/Line Shapefiles [machine-readable data files],” available at: http://www.census.gov/geo/maps-data/data/tiger.html .
[34] U.S. Geological Survey (1999), ”National Elevation Dataset,” available at: http://nationalmap.gov/viewer.html .
[35] van den Boogaart, K. G., and Tolosana-Delgado, R. (2008), ”Compositions: A Unified R Package to Analyze Compositional Data,” Computers and Geosciences, 34, 320–338. · doi:10.1016/j.cageo.2006.11.017
This reference list is based on information provided by the publisher or from digital mathematics libraries. Its items are heuristically matched to zbMATH identifiers and may contain data conversion errors. It attempts to reflect the references listed in the original paper as accurately as possible without claiming the completeness or perfect precision of the matching.