×

A mixture model approach for compositional data: inferring land-use influence on point-referenced water quality measurements. (English) Zbl 1428.62489

Summary: The assessment of water quality across space and time is of considerable interest for both agricultural and public health reasons. The standard method to assess the water quality of a sub-catchment, or a group of sub-catchments, usually involves collecting point measurements of water quality and other additional information such as the date and time of measurements, rainfall amounts, the land use and soil type of the catchment and the elevation. Some of this auxiliary information is point-referenced data, measured at the exact location, whereas other such as land use is areal data often recorded in a compositional format at the catchment or sub-catchment level. The spatial change of support inherited by this data collection process breaks the natural link between the response variable and the predictors. In this paper, we present an approach to reconstruct this link by using a categorical latent variable that identifies the land use that most likely influences water quality in each sub-catchment. This constitutes the spatial clustering layer of the model. Each cluster is associated with an estimated temporal variability common to water quality measurements. The strength of this approach lies in the temporal variation identifying each cluster, allowing decision makers to make inform decision regarding land use and its influence over water quality. We demonstrate the potential of this approach with data from a water quality research study in the Mount Lofty range, in South Australia.

MSC:

62P12 Applications of statistics to environmental and related topics
62F15 Bayesian inference
62H30 Classification and discrimination; cluster analysis (statistical aspects)
PDFBibTeX XMLCite
Full Text: DOI

References:

[1] Aitchison J (2003) A concise guide to compositional data analysis. CDA Workshop, Girona
[2] Bakar K, Sahu S (2013) spTimer: Spatio-Temporal Bayesian Modelling Using R. Journal of Statistical Software
[3] Baudry JP, Maugis C, Michel B (2012) Slope heuristics: Overview and implementation. Statistics and Computing 22(2):455-470 · Zbl 1322.62007 · doi:10.1007/s11222-011-9236-1
[4] Beck MB (1987) Water quality modeling: A review of the analysis of uncertainty. Water Resources Research 23(8):1393-1442 · doi:10.1029/WR023i008p01393
[5] Bishop CM (2006) Pattern Recognition and Machine Learning. Springer, New York, New York, USA, arXiv:1011.1669v3 · Zbl 1107.68072
[6] Buck O, Niyogi DK, Townsend CR (2004) Scale-dependence of land use effects on water quality of streams in agricultural catchments. Environmental pollution (Barking, Essex : 1987) 130(2):287-99 · doi:10.1016/j.envpol.2003.10.018
[7] Burcher CL (2009) Using simplified watershed hydrology to define spatially explicit ’zones of influence’. Hydrobiologia 618(1):149-160 · doi:10.1007/s10750-008-9572-0
[8] Cox JW, Oliver DP, Fleming NK, Anderson JS (2012) Off-site transport of nutrients and sediment from three main land-uses in the Mt Lofty Ranges, South Australia. Agricultural Water Management 106:50-59 · doi:10.1016/j.agwat.2011.08.014
[9] Cressie NAC, Wikle CK (2011) Statistics for Spatio-Temporal Data. John Wiley & Sons · Zbl 1273.62017
[10] Eddelbuettel D, François R (2011) Rcpp: Seamless R and C++ integration. Journal of Statistical Software 40(8):1-18, https://doi.org/10.18637/jss.v040.i08, http://www.jstatsoft.org/v40/i08/
[11] Ford J, Ickowicz A, Oliver D, Hayes K, Kookana R (2015) Integrated catchment water planning support for Adelaide Mount Lofty Ranges Water Allocation Planning ( GWAP Project ) Task 5 : Tiered Water Quality Risk Assessment. Tech. Rep. 15/4, Goyder Institute for Water Research, Adelaide
[12] Gelfand A, Zhu L, Carlin B (2001) On the change of support problem for spatio-temporal data. Biostatistics (Oxford, England) 2(1):31-45 · Zbl 1022.62095 · doi:10.1093/biostatistics/2.1.31
[13] Gelman A, Rubin DB (1992) Inference from Iterative Simulation Using Multiple Sequences. Statistical Science 7(4):457-472, arXiv:1011.1669v3 · Zbl 1386.65060 · doi:10.1214/ss/1177011136
[14] Gelman A, Stern HS, Carlin JB, Dunson DB, Vehtari A, Rubin DB (2013) Bayesian data analysis, 3rd edn. Chapman and Hall/CRC · Zbl 1279.62004
[15] Hunsaker CT, Levine DA (1995) Hierarchical Approaches of Water Quality in Rivers Study processes are important in developing. Sciences-New York 45(3):193-203
[16] Kass RE, Raftery AE (1995) Bayes factors. Journal of the American Statistical Association 90(430):773-795 · Zbl 0846.62028 · doi:10.1080/01621459.1995.10476572
[17] King RS, Baker ME, Whigham DF, Weller DE, Jordan TE, Kazyak PF, Hurd MK (2005) Spatial Considerations for Linking Watershed Land Cover To Ecological Indicators in Streams. Ecological Applications 15(1):137-153 · doi:10.1890/04-0481
[18] Lehmann EA, Phatak A, Soltyk S, Chia J, Lau R, Palmer M (2013) Bayesian hierarchical modelling of rainfall extremes. In: 20th International Congress on Modelling and Simulation, Adelaide, Australia, 1-6 December 2013, December, pp 1-6
[19] Lindstrom J, Szpiro A, Sampson P, Sheppard L, Oron A, Richards M, Larson T (2011) A flexible spatio-temporal model for air pollution: Allowing for spatio-temporal covariates UW Biostatistics Working Paper Series 370(January):1-38
[20] Lindstrom J, Szpiro A, Sampson P, Bergen S, Sheppard L (2013) SpatioTemporal : An R Package for Spatio-Temporal Modelling of Air-Pollution. CRAN Vignettes
[21] Moores MT, Hargrave CE, Deegan T, Poulsen M, Harden F, Mengersen K (2015) An external field prior for the hidden potts model with application to cone-beam computed tomography. Computational Statistics & Data Analysis 86:27-41, https://doi.org/10.1016/j.csda.2014.12.001, http://www.sciencedirect.com/science/article/pii/S0167947314003399 · Zbl 1468.62141 · doi:10.1016/j.csda.2014.12.001
[22] Murray I, Ghahramani Z, MacKay DJC (2006) MCMC for doubly-intractable distributions. Proceedings of the 22nd Annual Conference on Uncertainty in Artificial Intelligence (UAI-06) pp 359-366, arxiv:1206.6848
[23] Nguyen HD, McLachlan GJ, Ullmann JFP, Janke AL (2016) Spatial Clustering of Time-Series via Mixture of Autoregressions Models and Markov Random Fields. Arxiv preprint 70(4):1-42, arxiv:1601.03517 · Zbl 1528.62047 · doi:10.1111/stan.12093
[24] Peterson EE, Sheldon F, Darnell R, Bunn SE, Harch BD (2011) A comparison of spatially explicit landscape representation methods and their relationship to stream condition. Freshwater Biology 56(3):590-610 · doi:10.1111/j.1365-2427.2010.02507.x
[25] R Core Team (2013) R: A Language and Environment for Statistical Computing. R Foundation for Statistical Computing, Vienna, Austria, URL http://www.R-project.org/
[26] Raftery AE, Dean N (2004) Variable Selection for Model-Based Clustering. Journal of the American Statistical Association 101(473):168-178 · Zbl 1118.62339 · doi:10.1198/016214506000000113
[27] Redner RA, Walker HF (1984) Mixture densities, maximum likelihood and the em algorithm. SIAM Review 26(2):195-239 · Zbl 0536.62021 · doi:10.1137/1026034
[28] Rue H, Leonhard H (2005) Theory of Gaussian Markov Random Fields. In: Gaussian Markov Random Fields: Theory and Applications, 1st edn, Chapman and Hall, New York, chap Chapter 2, p 280 · Zbl 1093.60003
[29] Samé A, Chamroukhi F, Govaert G, Aknin P (2011) Model-based clustering and segmentation of time series with changes in regime. Advances in Data Analysis and Classification 5(4):301-321, arxiv:1312.6967 · Zbl 1274.62427 · doi:10.1007/s11634-011-0096-5
[30] Shen Z, Hou X, Li W, Aini G (2014) Relating landscape characteristics to non-point source pollution in a typical urbanized watershed in the municipality of Beijing. Landscape and Urban Planning 123:96-107 · doi:10.1016/j.landurbplan.2013.12.007
[31] Strayer DL, Beighley RE, Thompson LC, Brooks S, Nilsson C, Pinay G, Naiman RJ (2003) Effects of Land Cover on Stream Ecosystems: Roles of Empirical Models and Scaling Issues. Ecosystems 6(5):407-423 · doi:10.1007/PL00021506
[32] Szpiro A, Sampson P, Sheppard L, Lumley T, Adar S, Kaufman J (2010) Predicting intra-urban variation in air pollution concentrations with complex spatio-temporal dependencies. Environmetrics 21(6):606-631
[33] Varcoe J, van Leeuwen JA, Chittleborough DJ, Cox JW, Smernik RJ, Heitz A (2010) Changes in water quality following gypsum application to catchment soils of the Mount Lofty Ranges, South Australia, Organic Geochemistry 41(2):116-123 · doi:10.1016/j.orggeochem.2009.09.010
[34] Zhu L, Carlin B, Gelfand A (2003) Hierarchical regression with misaligned spatial data: relating ambient ozone and pediatric asthma ER visits in Atlanta. Environmetrics pp 1-33
This reference list is based on information provided by the publisher or from digital mathematics libraries. Its items are heuristically matched to zbMATH identifiers and may contain data conversion errors. In some cases that data have been complemented/enhanced by data from zbMATH Open. This attempts to reflect the references listed in the original paper as accurately as possible without claiming completeness or a perfect matching.