Datta, Abhirup; Banerjee, Sudipto; Hodges, James S.; Gao, Leiwen Spatial disease mapping using directed acyclic graph auto-regressive (DAGAR) models. (English) Zbl 1435.62319 Bayesian Anal. 14, No. 4, 1221-1244 (2019). Summary: Hierarchical models for regionally aggregated disease incidence data commonly involve region specific latent random effects that are modeled jointly as having a multivariate Gaussian distribution. The covariance or precision matrix incorporates the spatial dependence between the regions. Common choices for the precision matrix include the widely used ICAR model, which is singular, and its nonsingular extension which lacks interpretability. We propose a new parametric model for the precision matrix based on a directed acyclic graph (DAG) representation of the spatial dependence. Our model guarantees positive definiteness and, hence, in addition to being a valid prior for regional spatially correlated random effects, can also directly model the outcome from dependent data like images and networks. Theoretical results establish a link between the parameters in our model and the variance and covariances of the random effects. Simulation studies demonstrate that the improved interpretability of our model reaps benefits in terms of accurately recovering the latent spatial random effects as well as for inference on the spatial covariance parameters. Under modest spatial correlation, our model far outperforms the CAR models, while the performances are similar when the spatial correlation is strong. We also assess sensitivity to the choice of the ordering in the DAG construction using theoretical and empirical results which testify to the robustness of our model. We also present a large-scale public health application demonstrating the competitive performance of the model. Cited in 6 Documents MSC: 62M10 Time series, auto-correlation, regression, etc. in statistics (GARCH) 62P10 Applications of statistics to biology and medical sciences; meta analysis 62H11 Directional data; spatial statistics 05C90 Applications of graph theory 62F15 Bayesian inference Keywords:areal data; Bayesian inference; directed acyclic graphs; disease mapping; spatial autoregression Software:spBayes; ngspatial; glasso PDFBibTeX XMLCite \textit{A. Datta} et al., Bayesian Anal. 14, No. 4, 1221--1244 (2019; Zbl 1435.62319) Full Text: DOI arXiv Euclid References: [1] Assuncao, R. and Krainski, E. (2009). “Neighborhood dependence in Bayesian spatial models.” Biometrical Journal, 51: 851-869. · Zbl 1442.62242 [2] Banerjee, S., Carlin, B. P., and Gelfand, A. E. (2014). Hierarchical Modeling and Analysis for Spatial Data. Boca Raton, FL: Chapman & Hall/CRC, second edition. · Zbl 1358.62009 [3] Basseville, M., Benveniste, A., Chou, K. C., Golden, S. A., Nikoukhah, R., and Willsky, A. S. (2006). “Modeling and Estimation of Multiresolution Stochastic Processes.” IEEE Transactions on Information Theory, 38(2): 766-784. URL http://dx.doi.org/10.1109/18.119735 [4] Besag, J. (1974). “Spatial interaction and statistical analysis of lattice systems.” Journal of the Royal Statistical Society, Series B, 36: 192-225. · Zbl 0327.60067 [5] Besag, J. and Higdon, D. (1999). “Bayesian analysis of agricultural field experiments.” Journal of the Royal Statistical Society: Series B (Statistical Methodology), 61(4): 691-746. · Zbl 0951.62091 [6] Besag, J. and Kooperberg, C. (1995). “On conditional and intrinsic autoregressions.” Biometrika, 82: 733-746. · Zbl 0899.62123 [7] Besag, J. and Mondal, D. (2005). “First-order intrinsic autoregressions and the de Wijs process.” Biometrika, 92(4): 909-920. · Zbl 1151.62068 [8] Bickel, P. J. and Levina, E. (2008a). “Covariance regularization by thresholding.” The Annals of Statistics, 36(6): 2577-2604. · Zbl 1196.62062 [9] Bickel, P. J. and Levina, E. (2008b). “Regularized estimation of large covariance matrices.” The Annals of Statistics, 36(1): 199-227. · Zbl 1132.62040 [10] Cai, T. T., Zhang, C.-H., and Zhou, H. H. (2010). “Optimal rates of convergence for covariance matrix estimation.” The Annals of Statistics, 38(4): 2118-2144. · Zbl 1202.62073 [11] Clayton, D. G. and Bernardinelli, L. (1992). “Bayesian Methods for Mapping Disease Risk.” In Elliott, P., Cuzick, J., English, D., and Stern, R. (eds.), Geographical and Environmental Epidemiology: Methods for Small-Area Studies, 205-220. Oxford University Press. [12] Cressie, N. and Davidson, J. L. (1998). “Image analysis with partially ordered Markov models.” Computational Statistics and Data Analysis, 29(1): 1- 26. URL http://www.sciencedirect.com/science/article/pii/S0167947398000528 · Zbl 1042.62611 [13] Datta, A., Banerjee, S., Finley, A. O., and Gelfand, A. E. (2016). “Hierarchical Nearest-Neighbor Gaussian Process Models for Large Geostatistical Datasets.” Journal of the American Statistical Association, 111(514): 800-812. [14] Datta, A., Banerjee, S., Finley, A. O., Hamm, N. A. S., and Schaap, M. (2016). “Nonseparable dynamic nearest neighbor Gaussian process models for large spatio-temporal data with an application to particulate matter analysis.” The Annals of Applied Statistics, 10(3): 1286-1316. · Zbl 1391.62269 [15] Datta, A., Banerjee, S., Hodges, J. S., and Gao, L. (2019). “Supplement to “Spatial disease mapping using directed acyclic graph auto-regressive (DAGAR) models”.” Bayesian Analysis. · Zbl 1435.62319 [16] El Karoui, N. (2008). “Operator norm consistent estimation of large-dimensional sparse covariance matrices.” The Annals of Statistics, 36(6): 2717-2756. · Zbl 1196.62064 [17] Finley, A. O., Datta, A., Cook, B. C., Morton, D. C., Andersen, H. E., and Banerjee, S. (2017). “Applying Nearest Neighbor Gaussian Processes to Massive Spatial Data Sets: Forest Canopy Height Prediction Across Tanana Valley Alaska.” https://arxiv.org/pdf/1702.00434.pdf. [18] Friedman, J., Hastie, T., and Tibshirani, R. (2007). “Sparse inverse covariance estimation with the graphical lasso.” Biostatistics, 9: 432-441. · Zbl 1143.62076 [19] Gelfand, A. E. and Vounatsou, P. (2003). “Proper multivariate conditional autoregressive models for spatial data analysis.” Biostatistics, 4(1): 11. URL http://dx.doi.org/10.1093/biostatistics/4.1.11 · Zbl 1142.62393 [20] Hinton, G. E. (2002). “Training products of experts by minimizing contrastive divergence.” Neural Computation, 14: 1711-1800. · Zbl 1010.68111 [21] Hughes, J. and Cui, X. (2018). ngspatial: Fitting the Centered Autologistic and Sparse Spatial Generalized Linear Mixed Models for Areal Data. Denver, CO. R package version 1.2-1. [22] Hughes, J. and Haran, M. (2013). “Dimension reduction and alleviation of confounding for spatial generalized linear mixed models.” Journal of the Royal Statistical Society: Series B (Statistical Methodology), 75(1): 139-159. · Zbl 07555442 [23] Leroux, B. G., Lei, X., and Breslow, N. (2000). “Estimation of Disease Rates in Small Areas: A new Mixed Model for Spatial Dependence.” In Halloran, M. E. and Berry, D. (eds.), Statistical Models in Epidemiology, the Environment, and Clinical Trials, 179-191. New York, NY: Springer New York. · Zbl 0957.62095 [24] MacNab, Y. and Dean, C. (2000). “Parametric bootstrap and penalized quasi-likelihood inference in conditional autoregressive models.” Statistis in Medicine, 19: 15-30. [25] Martinez-Beneito, M. A. (2013). “A general modelling framework for multivariate disease mapping.” Biometrika, 100(3): 539. · Zbl 1284.62667 [26] Martinez-Beneito, M. A., Botella-Rocamora, P., and Banerjee, S. (2017). “Towards a Multidimensional Approach to Bayesian Disease Mapping.” Bayesian Analysis, 12(1): 239-259. · Zbl 1384.62308 [27] Meinshausen, N. and Buhlmann, P. (2006). “High-dimensional graphs and variable selection with the Lasso.” The Annals of Statistics, 34(3): 1436-1462. · Zbl 1113.62082 [28] Rothman, A. J., Levina, E., and Zhu, J. (2009). “Generalized Thresholding of Large Covariance Matrices.” Journal of the American Statistical Association, 104(485): 177-186. · Zbl 1388.62170 [29] Sørbye, S. H. and Rue, H. (2014). “Scaling intrinsic Gaussian Markov random field priors in spatial modelling.” Spatial Statistics, 8: 39-51. [30] Spiegelhalter, D. J., Best, N. G., Carlin, B. P., and van der Linde, A. (2002). “Bayesian Measures of Model Complexity and Fit.” Journal of the Royal Statistical Society, Series B, 64: 583-639. · Zbl 1067.62010 [31] Stein, M. L. (1999). Interpolation of Spatial Data: Some Theory for Kriging. New York, NY: Springer, first edition. · Zbl 0924.62100 [32] Stein, M. L., Chi, Z., and Welty, L. J. (2004). “Approximating Likelihoods for Large Spatial Data Sets.” Journal of the Royal Statistical Society, Series B, 66: 275-296. · Zbl 1062.62094 [33] Sudderth, E. B. (2002). “Embedded Trees: Estimation of Gaussian Processes on Graphs with Cycles.” http://cs.brown.edu/ sudderth/papers/sudderthMasters.pdf. [34] Vecchia, A. V. (1988). “Estimation and Model Identification for Continuous Spatial Processes.” Journal of the Royal Statistical Society, Series B, 50: 297-312. [35] Wall, M. (2004). “A close look at the spatial structure implied by the CAR and SAR models.” Journal of Statistical Planning and Inference, 121: 311-324. · Zbl 1036.62097 [36] Whittle, P. (1954). “On Stationary Processes in the Plane.” Biometrika, 41(3/4): 434-449. URL http://www.jstor.org/stable/2332724 · Zbl 0058.35601 [37] Wu, W. and Pourahmadi, M. (2003). “Nonparametric Estimation of Large Covariance Matrices of Longitudinal Data.” Biometrika, 90(4): 831-844. URL http://www.jstor.org/stable/30042091 · Zbl 1436.62347 [38] Xue, L., Ma, S., and Zou, H. (2012). “Positive-Definite \(\ell_1\)-Penalized Estimation of Large Covariance Matrices.” Journal of the American Statistical Association, 107(500): 1480-1491. · Zbl 1258.62063 This reference list is based on information provided by the publisher or from digital mathematics libraries. Its items are heuristically matched to zbMATH identifiers and may contain data conversion errors. In some cases that data have been complemented/enhanced by data from zbMATH Open. This attempts to reflect the references listed in the original paper as accurately as possible without claiming completeness or a perfect matching.