×

A modified generalized Lasso algorithm to detect local spatial clusters for count data. (English) Zbl 1421.62092

Summary: Detecting local spatial clusters for count data is an important task in spatial epidemiology. Two broad approaches – moving window and disease mapping methods – have been suggested in some of the literature to find clusters. However, the existing methods employ somewhat arbitrarily chosen tuning parameters, and the local clustering results are sensitive to the choices. In this paper, we propose a penalized likelihood method to overcome the limitations of existing local spatial clustering approaches for count data. We start with a Poisson regression model to accommodate any type of covariates, and formulate the clustering problem as a penalized likelihood estimation problem to find change points of intercepts in two-dimensional space. The cost of developing a new algorithm is minimized by modifying an existing least absolute shrinkage and selection operator algorithm. The computational details on the modifications are shown, and the proposed method is illustrated with Seoul tuberculosis data.

MSC:

62J07 Ridge regression; shrinkage estimators (Lasso)
62H30 Classification and discrimination; cluster analysis (statistical aspects)
62P10 Applications of statistics to biology and medical sciences; meta analysis
62H11 Directional data; spatial statistics

Software:

glmnet
PDFBibTeX XMLCite
Full Text: DOI

References:

[1] Amin, R., Bohnert, A., Holmes, L., Rajasekaran, A., Assanasen, C.: Epidemiologic mapping of Florida childhood cancer clusters. Pediatr. Blood Cancer 54, 511-518 (2010)
[2] Assunção, R., Costa, M., Tavares, A., Ferreira, S.: Fast detection of arbitrarily shaped disease clusters. Stat. Med. 25, 723-742 (2006) · doi:10.1002/sim.2411
[3] Beck, A., Teboulle, M.: A fast iterative shrinkage-thresholding algorithm for linear inverse problems. SIAM J. Imaging Sci. 2, 183-202 (2009) · Zbl 1175.94009 · doi:10.1137/080716542
[4] Besag, J., Newell, J.: The detection of clusters in rare diseases. J. R. Stat. Soc. Ser. A 154, 143-155 (1991) · doi:10.2307/2982708
[5] Besag, J., York, J., Mollié, A.: Bayesian image restoration, with two applications in spatial statistics. Ann. Inst. Stat. Math. 43, 1-20 (1991) · Zbl 0760.62029 · doi:10.1007/BF00116466
[6] Fan, J., Li, R.: Variable Selection via nonconcave penalized likelihood and its oracle properties. J. Am. Stat. Soc. 96, 1348-1360 (2001) · Zbl 1073.62547 · doi:10.1198/016214501753382273
[7] Friedman, J., Hastie, T., Tibshirani, R.: Regularization paths for generalized linear models via coordinate descent. J. Stat. Softw. 33, 1-22 (2010) · doi:10.18637/jss.v033.i01
[8] Hannart, A., Naveau, P.: An improved Bayesian information criterion for multiple change point models. Technometrics 54, 256-268 (2012) · doi:10.1080/00401706.2012.694780
[9] Heinzl, F., Tutz, G.: Clustering in linear-mixed models with a group fused lasso penalty. Biom. J. 56, 44-68 (2014) · Zbl 1280.62076 · doi:10.1002/bimj.201200111
[10] Hunter, D.: MM algorithms for generalized Bradley-Terry models. Ann. Stat. 32, 384-406 (2004) · Zbl 1105.62359 · doi:10.1214/aos/1079120141
[11] Hunter, D., Lange, K.: A tutorial on MM algorithms. Am. Stat. 58, 30-37 (2004) · doi:10.1198/0003130042836
[12] Hunter, D., Li, R.: Variable selection using MM algorithms. Ann. Stat. 33, 1617-1642 (2005) · Zbl 1078.62028 · doi:10.1214/009053605000000200
[13] Jung, I.: A generalized linear models approach to spatial scan statistics for covariate adjustment. Stat. Med. 28, 1131-1143 (2009) · doi:10.1002/sim.3535
[14] Kulldorff, M.: A spatial scan statistic. Commun. Stat. Theory Methods 26, 1481-1496 (1997) · Zbl 0920.62116 · doi:10.1080/03610929708831995
[15] Kulldorff, M., Huang, L., Pickle, L., Duczmal, L.: An elliptic spatial scan statistic. Stat. Med. 25, 3929-3943 (2006) · doi:10.1002/sim.2490
[16] Kulldorff, M., Nagarwalla, N.: Spatial disease clusters: detection and inference. Stat. Med. 14, 799-810 (1995) · doi:10.1002/sim.4780140809
[17] Lange, K.: Optimization. Springer, London (2013) · Zbl 1273.90002 · doi:10.1007/978-1-4614-5838-8
[18] Lopez de Fede, A., Stewart, J., Harris, M., Mayfield-Smith, K.: Tuberculosis in socio-economically deprived neighborhoods: missed opportunities for prevention. Int. J. Tuberc. Lung Dis. 12, 1425-1430 (2008)
[19] McLennan, D., Barnes, H., Noble, M., Davies, J., Garratt, E.: The English Indices of Deprivation 2010, pp. 13-14. Department for Communities and Local Government, London (2011)
[20] Ngui, A.N., Apparicio, P., Fleury, M.J., Lesage, A., Grgoire, J.P., Moisan, J., Vanasse, A.: Spatio-temporal clustering of the incidence of Schizophrenia in Quebec, Canada from 2004 to 2007. Spat. Spat. Tempor. Epidemiol. (2013). https://doi.org/10.1016/j.sste.2013.05.003
[21] Oelker, M., Gertheiss, J., Tutz, G.: Regularization and model selection with categorical predictors and effect modifiers in generalized linear models. Stat. Modelling 14, 157-177 (2014) · Zbl 07257900 · doi:10.1177/1471082X13503452
[22] Ollier, E., Viallon, V.: Regression modeling on stratified data with the lasso. (2016). arXiv:1508.05476v2 · Zbl 1506.62456
[23] Openshaw, S., Charlton, M., Wymer, C., Craft, A.W.: analysis machine for the automated analysis of point data sets. Int. J. Geogr. Inf. Syst. 1, 335-358 (1987) · doi:10.1080/02693798708927821
[24] Picard, F., Robin, S., Lavielle, M., Vaisse, C., Daudin, J.: A statistical approach for array CGH data analysis. BMC Bioinf. 6, 27 (2005) · doi:10.1186/1471-2105-6-27
[25] Richardson, S., Thompson, A., Best, N., Elliott, P.: Interpreting Posterior relative risk estimates in disease-mapping studies. Environ. Health Perspect. 112, 1016-1025 (2004) · doi:10.1289/ehp.6740
[26] Sommer, J.C., Gertheiss, J., Schmid, V.J.: Spatially regularized estimation for the analysis of dynamic contrast-enhanced magnetic resonance imaging data. Stat. Med. 33, 1029-1041 (2014) · doi:10.1002/sim.5997
[27] Sugumaran, R., Larson, S.R., DeGroote, J.P.: Spatio-temporal cluster analysis of county-based human West Nile virus incidence in the continental United States. Int. J. Health Geogr. 8, 43 (2009) · doi:10.1186/1476-072X-8-43
[28] Tango, T., Takahashi, K.: A flexibly shaped spatial scan statistic for detecting clusters. Int. J. Health Geogr. 4, 11 (2005) · doi:10.1186/1476-072X-4-11
[29] Tibshirani, R., Saunders, M., Rosset, S., Zhu, J., Knight, K.: Sparsity and smoothness via the fused lasso. J. R. Stat. Soc. Ser. B 67, 91-108 (2005) · Zbl 1060.62049 · doi:10.1111/j.1467-9868.2005.00490.x
[30] Tibshirani, R., Wang, P.: Spatial smoothing and hot spot detection for CGH data using the fused lasso. Biostatistics 9, 18-29 (2008) · Zbl 1274.62886 · doi:10.1093/biostatistics/kxm013
[31] Tibshirani, R.J., Taylor, J.: The solution path of the generalized lasso. Ann. Stat. 39, 1335-1371 (2011) · Zbl 1234.62107 · doi:10.1214/11-AOS878
[32] Townsend, P.: Deprivation. J. Soc. Policy 16, 125-146 (1987) · doi:10.1017/S0047279400020341
[33] Wang, H., Rodríguez, A.: Identifying pediatric cancer clusters in Florida using log-linear models and generalized lasso penalties. Stat. Public Policy 1, 86-96 (2014) · doi:10.1080/2330443X.2014.960120
[34] Zhang, C.: Nearly unbiased variable selection under minimax concave penalty. Ann. Stat. 38, 894-942 (2010) · Zbl 1183.62120 · doi:10.1214/09-AOS729
[35] Zhang, N.R., Siegmund, D.O.: A modified Bayesian information criterion with applications to the analysis of comparative genome hybridization data. Biometrics 63, 22-32 (2007) · Zbl 1206.62174 · doi:10.1111/j.1541-0420.2006.00662.x
This reference list is based on information provided by the publisher or from digital mathematics libraries. Its items are heuristically matched to zbMATH identifiers and may contain data conversion errors. In some cases that data have been complemented/enhanced by data from zbMATH Open. This attempts to reflect the references listed in the original paper as accurately as possible without claiming completeness or a perfect matching.