Model selection for generalized linear models with factor-augmented predictors.

*(English)*Zbl 1223.62129The authors deal with generalized linear models in rich data environments. In particular, they consider the case where the sample size and the number of explanatory variables are of similar size. In such cases, dimension reduction is necessary for statistical inference. For dimension reduction the authors adopt the idea of principal components regression and assume that a small number of common factors of the explanatory variables are sufficient to describe the relevant information concerning the dependent variables. The common factors are latent variables and must be constructed from the observable explanatory variables. This study considers two important issues of generalized linear models with factor-augmented regressors. The first issue is the selection of the number of factors that best explain the response variables. The second issue is the selection of a distributional assumption for the response variables. Taking into account the effects of estimated regressors, the authors develop an information-theoretic criterion for model misspecifications for both the distributional and structural assumptions. Under the conditions \(T^{5/8}/N\to0\) and \(\sqrt{N}/T\to0\), it is shown that the bias term of the proposed estimate \(\min\{N,T\sqrt{T}\}\) is consistent. The proposed criterion is a natural extension of the Akaike information criterion. Simulations and an empirical data analysis demonstrate that the proposed new criterion outperforms the Akaike information criterion and the Bayesian information criterion.

Reviewer: A. D. Borisenko (Kyïv)

##### MSC:

62J12 | Generalized linear models (logistic models) |

62H25 | Factor analysis and principal components; correspondence analysis |

62B10 | Statistical aspects of information-theoretic topics |

65C60 | Computational problems in statistics (MSC2010) |

PDF
BibTeX
XML
Cite

\textit{T. Ando} and \textit{R. S. Tsay}, Appl. Stoch. Models Bus. Ind. 25, No. 3, 207--235 (2009; Zbl 1223.62129)

Full Text:
DOI

##### References:

[1] | McCullagh, Generalized Linear Models (1989) · Zbl 0588.62104 · doi:10.1007/978-1-4899-3242-6 |

[2] | Tibishirani, Regression shrinkage and selection via the lasso, Journal of the Royal Statistical Society, Series B 58 pp 267– (1996) |

[3] | Angelini E, Henry J, Mestre R. Diffusion index-based inflation forecasts for the Euro area. Working Paper 61, European Central Bank, 2001. |

[4] | Artis MJ, Banerjee A, Marcellino M. Factor forecasts for the UK. IGIER, manuscript, Bocconi University, 2001. |

[5] | Banerjee, Are there any reliable leading indicators for US inflation and GDP growth?, International Journal of Forecasting 22 pp 137– (2006) |

[6] | Bai, Inferential theory for factor models of large dimensions, Econometrica 71 pp 135– (2003) · Zbl 1136.62354 |

[7] | Bai, Determining the number of factors in approximate factor models, Econometrica 70 pp 191– (2002) · Zbl 1103.91399 |

[8] | Bai, Confidence intervals for diffusion index forecasts and inference for factor-augmented regressions, Econometrica 74 pp 1133– (2006) · Zbl 1152.91721 |

[9] | Bernanke, Monetary policy in a data-rich environment, Journal of Monetary Economics 50 pp 525– (2003) · doi:10.1016/S0304-3932(03)00024-2 |

[10] | Boivin J, Ng S. Are more data always better for factor analysis?. NBER Working Paper No. 9829, 2003. · Zbl 1337.62345 |

[11] | Boivin J, Ng S. Understanding and comparing factor-based forecasts. NBER Working Paper No. 11285, 2005. |

[12] | Connor, Performance measurement with the arbitrage pricing theory: a new framework for analysis, Journal of Financial Economics 15 pp 373– (1986) |

[13] | Connor, Risk and return in an equilibrium APT: application of a new test methodology, Journal of Financial Economics 21 pp 255– (1988) |

[14] | Forni, The generalized dynamic factor model: representation theory, Econometric Theory 17 pp 1113– (2001) · Zbl 1181.62189 |

[15] | Forni, Let’s get real: a factor-analytic approach to disaggregated business cycle dynamics, Review of Economic Studies 65 pp 453– (1998) · Zbl 0911.90087 |

[16] | Forni, The generalized dynamic factor model: identification and estimation, Review of Economics and Statistics 82 pp 540– (2000) |

[17] | Forni, Do financial variables help in forecasting inflation and real activity in the Euro area?, Journal of Monetary Economics 50 pp 1243– (2001) |

[18] | Forni, The generalized factor model: consistency and rates, Journal of Econometrics 119 pp 231– (2004) · Zbl 1282.91267 |

[19] | Geweke, Latent Variables in Socio-economic Models pp 365– (1977) |

[20] | Jones, Extracting factors from heteroskedastic asset returns, Journal of Financial Economics 62 pp 293– (2001) |

[21] | Koop, Forecasting in dynamic factor models using Bayesian model averaging, Econometrics Journal 7 pp 550– (2004) · Zbl 1063.62032 |

[22] | Stock, Forecasting using principal components from a large number of predictors, Journal of the American Statistical Association 97 pp 1167– (2002) · Zbl 1041.62081 |

[23] | Stock, Macroeconomic forecasting using diffusion indexes, Journal of Business and Economic Statistics 20 pp 147– (2002) |

[24] | Stock, Forecasting output and inflation: the role of asset prices, Journal of Economic Literature 41 pp 788– (2003) |

[25] | Stock JH, Watson MW. An empirical comparison of methods for forecasting using many predictors. Working Paper, 2005. |

[26] | Stock, Handbook of Economic Forecasting pp 515– (2006) |

[27] | Hallin, The generalized dynamic factor model: determining the number of factors, Journal of the American Statistical Association 102 pp 603– (2007) · Zbl 1172.62339 |

[28] | Ando T, Tsay R. Predictive model selection for the diffusion index models. Booth School of Business, University of Chicago, manuscript, 2008. |

[29] | Akaike, 2nd International Symposium on Information Theory pp 267– (1973) |

[30] | Takeuchi, Distribution of information statistics and criteria for adequacy models, Mathematical Sciences 153 pp 12– (1976) |

[31] | Chamberlain, Arbitrage, factor structure and mean-variance analysis in large asset markets, Econometrica 51 pp 1281– (1983) · Zbl 0523.90017 |

[32] | Bai, Extremum estimation when the predictors are estimated from large panels, Annals of Economics and Finance 9 pp 201– (2008) |

[33] | Heaton C, Solo V. Estimation of approximate factor models: is it important to have a large number of variables?. Working Paper, School of Economics, University of New South Wales, Australia, 2006. |

[34] | Stock, Forecasting inflation, Journal of Monetary Economics 44 pp 293– (1999) |

[35] | Konishi, Generalized information criteria in model selection, Biometrika 83 pp 875– (1996) · Zbl 0883.62004 |

[36] | Konishi, Multivariate Analysis, Design of Experiments and Survey Sampling pp 369– (1999) |

[37] | Hansen, Challenges for econometric model selection, Econometric Theory 21 pp 60– (2005) · Zbl 1072.62116 |

[38] | Ando, Bayesian predictive information criterion for the evaluation of hierarchical Bayesian and empirical Bayes models, Biometrika 94 pp 443– (2007) · Zbl 1132.62005 |

[39] | Nelder, Generalized linear models, Journal of the Royal Statistical Society Series A 135 pp 370– (1972) |

[40] | Akaike, A new look at the statistical model identification, IEEE Transactions on Automatic Control 19 pp 716– (1974) · Zbl 0314.62039 |

[41] | Kullback, On information and sufficiency, Annals of Mathematical Statistics 22 pp 79– (1951) · Zbl 0042.38403 |

[42] | Alizadeh, Distinct types of diffuse large B-cell lymphoma identified by gene expression profiling, Nature 403 pp 503– (2000) |

[43] | Farhmeir, Consistency and asymptotic normality of the maximum likelihood estimator in generalized linear models, The Annals of Statistics 13 pp 342– (1985) |

[44] | Deutsch RC. Benchmark analysis for two predictor variables. Unpublished Ph.D. Thesis, University of South Carolina, 2007. |

This reference list is based on information provided by the publisher or from digital mathematics libraries. Its items are heuristically matched to zbMATH identifiers and may contain data conversion errors. It attempts to reflect the references listed in the original paper as accurately as possible without claiming the completeness or perfect precision of the matching.