×

Separable factor analysis with applications to mortality data. (English) Zbl 1454.62185

Summary: Human mortality data sets can be expressed as multiway data arrays, the dimensions of which correspond to categories by which mortality rates are reported, such as age, sex, country and year. Regression models for such data typically assume an independent error distribution or an error model that allows for dependence along at most one or two dimensions of the data array. However, failing to account for other dependencies can lead to inefficient estimates of regression parameters, inaccurate standard errors and poor predictions. An alternative to assuming independent errors is to allow for dependence along each dimension of the array using a separable covariance model. However, the number of parameters in this model increases rapidly with the dimensions of the array and, for many arrays, maximum likelihood estimates of the covariance parameters do not exist. In this paper, we propose a submodel of the separable covariance model that estimates the covariance matrix for each dimension as having factor analytic structure. This model can be viewed as an extension of factor analysis to array-valued data, as it uses a factor model to estimate the covariance along each dimension of the array. We discuss properties of this model as they relate to ordinary factor analysis, describe maximum likelihood and Bayesian estimation methods, and provide a likelihood ratio testing procedure for selecting the factor model ranks. We apply this methodology to the analysis of data from the Human Mortality Database, and show in a cross-validation experiment how it outperforms simpler methods. Additionally, we use this model to impute mortality rates for countries that have no mortality data for several years. Unlike other approaches, our methodology is able to estimate similarities between the mortality rates of countries, time periods and sexes, and use this information to assist with the imputations.

MSC:

62H25 Factor analysis and principal components; correspondence analysis
62F15 Bayesian inference
62-08 Computational methods for problems pertaining to statistics
62P05 Applications of statistics to actuarial sciences and financial mathematics

Software:

Human Mortality
PDFBibTeX XMLCite
Full Text: DOI arXiv Euclid

References:

[1] Allen, G. I. and Tibshirani, R. (2010). Transposable regularized covariance models with an application to missing data imputation. Ann. Appl. Stat. 4 764-790. · Zbl 1194.62079 · doi:10.1214/09-AOAS314
[2] Anderson, T. W. and Rubin, H. (1956). Statistical inference in factor analysis. In Proceedings of the Third Berkeley Symposium on Mathematical Statistics and Probability 1954 - 1955, Vol. V 111-150. Univ. California Press, Berkeley and Los Angeles.
[3] Bhattacharya, A. and Dunson, D. B. (2011). Sparse Bayesian infinite factor models. Biometrika 98 291-306. · Zbl 1215.62025 · doi:10.1093/biomet/asr013
[4] Brass, W. (1971). On the scale of mortality. In Biological Aspects of Demography 69-110. Taylor and Francis, London.
[5] Browne, M. W. (1984). The decomposition of multitrait-multimethod matrices. British J. Math. Statist. Psych. 37 1-21. · Zbl 0551.62073 · doi:10.1111/j.2044-8317.1984.tb00785.x
[6] Carter, L. R. and Lee, R. D. (1992). Modeling and forecasting US sex differentials in mortality. International Journal of Forecasting 8 393-411.
[7] Carvalho, C. M., Chang, J., Lucas, J. E., Nevins, J. R., Wang, Q. and West, M. (2008). High-dimensional sparse factor modeling: Applications in gene expression genomics. J. Amer. Statist. Assoc. 103 1438-1456. · Zbl 1286.62091 · doi:10.1198/016214508000000869
[8] Chiou, J.-M. and Müller, H.-G. (2009). Modeling hazard rates as functional data for the analysis of cohort lifetables and mortality forecasting. J. Amer. Statist. Assoc. 104 572-585. · Zbl 06441076 · doi:10.1198/jasa.2009.0023
[9] Coale, A. J. and Demeny, P. (1966). Regional Model Life Tables and Stable Populations . Princeton Univ. Press, Princeton.
[10] Congdon, P. (1993). Statistical graduation in local demographic analysis and projection. J. Roy. Statist. Soc. Ser. A 156 237-270.
[11] Currie, I. D., Durban, M. and Eilers, P. H. C. (2004). Smoothing and forecasting mortality rates. Stat. Model. 4 279-298. · Zbl 1061.62171 · doi:10.1191/1471082X04st080oa
[12] Dawid, A. P. (1981). Some matrix-variate distribution theory: Notational considerations and a Bayesian application. Biometrika 68 265-274. · Zbl 0464.62039 · doi:10.1093/biomet/68.1.265
[13] Dempster, A. P., Laird, N. M. and Rubin, D. B. (1977). Maximum likelihood from incomplete data via the EM algorithm. J. R. Stat. Soc. Ser. B Stat. Methodol. 39 1-38. · Zbl 0364.62022
[14] De Lathauwer, L., De Moor, B. and Vandewalle, J. (2000). A multilinear singular value decomposition. SIAM J. Matrix Anal. Appl. 21 1253-1278 (electronic). · Zbl 0962.15005 · doi:10.1137/S0895479896305696
[15] Diaconis, P., Goel, S. and Holmes, S. (2008). Horseshoes in multidimensional scaling and local kernel methods. Ann. Appl. Stat. 2 777-807. · Zbl 1149.62316 · doi:10.1214/08-AOAS165
[16] Dobra, A., Lenkoski, A. and Rodriguez, A. (2011). Bayesian inference for general Gaussian graphical models with application to multivariate lattice data. J. Amer. Statist. Assoc. 106 1418-1433. · Zbl 1234.62018 · doi:10.1198/jasa.2011.tm10465
[17] Felipe, A., Guillen, M. and Nielsen, J. P. (2001). Longevity studies based on kernel hazard estimation. Insurance Math. Econom. 28 191-204. · Zbl 1013.62102 · doi:10.1016/S0167-6687(00)00076-7
[18] Genton, M. G. (2007). Separable approximations of space-time covariance matrices. Environmetrics 18 681-695. · doi:10.1002/env.854
[19] Geweke, J. and Zhou, G. (1996). Measuring the pricing error of the arbitrage pricing theory. Rev. Financ. Stud. 9 557-587.
[20] Hartmann, M. (1987). Past and recent attempts to model mortality at all ages. J. Off. Stat. 3 19-36.
[21] Heligman, L. and Pollard, J. H. (1980). The age pattern of mortality. Journal of the Institute of Actuaries 107 49-80.
[22] Hoff, P. D. (2011). Separable covariance arrays via the Tucker product, with applications to multivariate relational data. Bayesian Anal. 6 179-196. · Zbl 1330.62132 · doi:10.1214/11-BA606
[23] Human Mortality Database University of California, Berkeley (USA) and Max Planck Institute for Demographic Research (Germany) (2011). Available at or www.humanmortality.de (data downloaded in 2011).
[24] Jennrich, R. I. and Robinson, S. M. (1969). A Newton-Raphson algorithm for maximum likelihood factor analysis. Psychometrika 34 111-123. · doi:10.1007/BF02290176
[25] Jöreskog, K. G. (1967). Some contributions to maximum likelihood factor analysis. Psychometrika 32 443-482. · Zbl 0183.24603 · doi:10.1007/BF02289658
[26] Kass, R. E. and Wasserman, L. (1995). A reference Bayesian test for nested hypotheses and its relationship to the Schwarz criterion. J. Amer. Statist. Assoc. 90 928-934. · Zbl 0851.62020 · doi:10.2307/2291327
[27] Kiers, H. A. L. (2000). Towards a standardized notation and terminology in multiway analysis. J. Chemom. 14 105-122.
[28] Kolda, T. G. and Bader, B. W. (2009). Tensor decompositions and applications. SIAM Rev. 51 455-500. · Zbl 1173.65029 · doi:10.1137/07070111X
[29] Kroonenberg, P. M. (2008). Applied Multiway Data Analysis . Wiley, Hoboken, NJ. · Zbl 1160.62002
[30] Lawley, D. N. (1940). The estimation of factor loadings by the method of maximum likelihood. Proc. Roy. Soc. Edinburgh Sect. A 60 64-82. · Zbl 0027.23503
[31] Lee, R. D. and Carter, L. R. (1992). Modeling and forecasting U.S. mortality. J. Amer. Statist. Assoc. 87 659-671. · Zbl 1351.62186
[32] Lee, S.-Y. and Song, X.-Y. (2002). Bayesian selection on the number of factors in a factor analysis model. Behaviormetrika 29 23-39. · Zbl 1014.62034 · doi:10.2333/bhmk.29.23
[33] Li, N. and Lee, R. (2005). Coherent mortality forecasts for a group of populations: An extension of the Lee-Carter method. Demography 42 575-594.
[34] Liu, C. and Rubin, D. B. (1998). Maximum likelihood estimation of factor analysis using the ECME algorithm with complete and incomplete data. Statist. Sinica 8 729-747. · Zbl 1008.62710
[35] Lopes, H. F. and West, M. (2004). Bayesian model assessment in factor analysis. Statist. Sinica 14 41-67. · Zbl 1035.62060
[36] Manceur, A. M. and Dutilleul, P. (2013). Maximum likelihood estimation for the tensor normal distribution: Algorithm, minimum sample size, and empirical bias and dispersion. J. Comput. Appl. Math. 239 37-49. · Zbl 1255.65029 · doi:10.1016/j.cam.2012.09.017
[37] Mardia, K. V., Kent, J. T. and Bibby, J. M. (1979). Multivariate Analysis . Academic Press, San Diego, CA. · Zbl 0432.62029
[38] Martínez-Ruiz, F., Mateu, J., Montes, F. and Porcu, E. (2010). Mortality risk assessment through stationary space-time covariance functions. Stoch. Environ. Res. Risk Assess. 24 519-526.
[39] McNown, R. and Rogers, A. (1989). Forecasting mortality: A parameterized time series approach. Demography 26 645-660.
[40] Meng, X.-L. and Rubin, D. B. (1993). Maximum likelihood estimation via the ECM algorithm: A general framework. Biometrika 80 267-278. · Zbl 0778.62022 · doi:10.1093/biomet/80.2.267
[41] Mode, C. and Busby, R. (1982). An eight-parameter model of human mortality-The single decrement case. Bull. Math. Biol. 44 647-659. · Zbl 0494.62089
[42] Murray, C. J. L., Ferguson, B. D., Lopez, A. D., Guillot, M., Salomon, J. A. and Ahmad, O. (2003). Modified logit life table system: Principles, empirical validation, and application. Population Studies 57 165-182.
[43] United Nations (1982). Model life tables for developing countries. In Population Studies 77 . United Nations, New York.
[44] Oort, F. J. (1999). Stochastic three-mode models for mean and covariance structures. British J. Math. Statist. Psych. 52 243-272. · doi:10.1348/000711099159099
[45] Renshaw, A. E. and Haberman, S. (2003a). Lee-Carter mortality forecasting with age-specific enhancement. Insurance Math. Econom. 33 255-272. Papers presented at the 6th IME Conference (Lisbon, 2002). · Zbl 1103.91371 · doi:10.1016/S0167-6687(03)00138-0
[46] Renshaw, A. and Haberman, S. (2003b). Lee-Carter mortality forecasting: A parallel generalized linear modelling approach for England and Wales mortality projections. J. R. Stat. Soc. Ser. C Appl. Stat. 52 119-137. · Zbl 1111.62359 · doi:10.1111/1467-9876.00393
[47] Renshaw, A. E. and Haberman, S. (2003c). On the forecasting of mortality reduction factors. Insurance Math. Econom. 32 379-401. · Zbl 1025.62041 · doi:10.1016/S0167-6687(03)00118-5
[48] Renshaw, A. E., Haberman, S. and Hatzopoulos, P. (1996). The modelling of recent mortality trends in United Kingdom male assured lives. British Actuarial Journal 2 449-477.
[49] Robertson, D. and Symons, J. (2007). Maximum likelihood factor analysis with rank-deficient sample covariance matrices. J. Multivariate Anal. 98 813-828. · Zbl 1123.62042 · doi:10.1016/j.jmva.2006.11.012
[50] Rubin, D. B. and Thayer, D. T. (1982). EM algorithms for ML factor analysis. Psychometrika 47 69-76. · Zbl 0483.62046 · doi:10.1007/BF02293851
[51] Siler, W. (1983). Parameters of mortality in human populations with widely varying life spans. Stat. Med. 2 373-380.
[52] Spall, J. C. (2005). Monte Carlo computation of the Fisher information matrix in nonstandard settings. J. Comput. Graph. Statist. 14 889-909. · doi:10.1198/106186005X78800
[53] Spearman, C. (1904). “General intelligence,” objectively determined and measured. Am. J. Psychol. 15 201-292.
[54] Stein, M. L. (2005). Space-time covariance functions. J. Amer. Statist. Assoc. 100 310-321. · Zbl 1117.62431 · doi:10.1198/016214504000000854
[55] Wang, H. (2012). Bayesian graphical lasso models and efficient posterior computation. Bayesian Anal. 7 867-886. · Zbl 1330.62041 · doi:10.1214/12-BA729
[56] Wang, H. and West, M. (2009). Bayesian analysis of matrix normal graphical models. Biometrika 96 821-834. · Zbl 1179.62042 · doi:10.1093/biomet/asp049
[57] White, H. (1982). Maximum likelihood estimation of misspecified models. Econometrica 50 1-25. · Zbl 0478.62088 · doi:10.2307/1912526
[58] Zhao, J.-H., Yu, P. L. H. and Jiang, Q. (2008). ML estimation for factor analysis: EM or non-EM? Stat. Comput. 18 109-123. · doi:10.1007/s11222-007-9042-y
This reference list is based on information provided by the publisher or from digital mathematics libraries. Its items are heuristically matched to zbMATH identifiers and may contain data conversion errors. In some cases that data have been complemented/enhanced by data from zbMATH Open. This attempts to reflect the references listed in the original paper as accurately as possible without claiming completeness or a perfect matching.