Variable selection and model building via likelihood basis pursuit.

*(English)*Zbl 1117.62459Summary: A nonparametric penalized likelihood approach for variable selection and model building, called likelihood basis pursuit (LBP). In the setting of a tensor product reproducing kernel Hilbert space, we decompose the log-likelihood into the sum of different functional components such as main effects and interactions, with each component represented by appropriate basis functions. Basis functions are chosen to be compatible with variable selection and model building in the context of a smoothing spline ANOVA model. Basis pursuit is applied to obtain the optimal decomposition in terms of having the smallest \(l_{1}\) norm on the coefficients. We use the functional \(L_{1}\) norm to measure the importance of each component and determine the ”threshold” value by a sequential Monte Carlo bootstrap test algorithm. As a generalized LASSO-type method, LBP produces shrinkage estimates for the coefficients, which greatly facilitates the variable selection process and provides highly interpretable multivariate functional estimates at the same time. To choose the regularization parameters appearing in the LBP models, generalized approximate cross-validation (GACV) is derived as a tuning criterion. To make GACV widely applicable to large datasets, its randomized version is proposed as well. A technique ”slice modeling” is used to solve the optimization problem and makes the computation more efficient. LBP has great potential for a wide range of research and application areas such as medical studies, and in this article we apply it to two large ongoing epidemiologic studies, the Wisconsin Epidemiologic Study of Diabetic Retinopathy (WESDR) and the Beaver Dam Eye Study (BDES).

##### MSC:

62-XX | Statistics |