×

Sparse regression with multi-type regularized feature modeling. (English) Zbl 1460.91218

Summary: Within the statistical and machine learning literature, regularization techniques are often used to construct sparse (predictive) models. Most regularization strategies only work for data where all predictors are treated identically, such as Lasso regression for (continuous) predictors treated as linear effects. However, many predictive problems involve different types of predictors and require a tailored regularization term. We propose a multi-type Lasso penalty that acts on the objective function as a sum of subpenalties, one for each type of predictor. As such, we allow for predictor selection and level fusion within a predictor in a data-driven way, simultaneous with the parameter estimation process. We develop a new estimation strategy for convex predictive models with this multi-type penalty. Using the theory of proximal operators, our estimation procedure is computationally efficient, partitioning the overall optimization problem into easier to solve subproblems, specific for each predictor type and its associated penalty. Earlier research applies approximations to non-differentiable penalties to solve the optimization problem. The proposed SMuRF algorithm removes the need for approximations and achieves a higher accuracy and computational efficiency. This is demonstrated with an extensive simulation study and the analysis of a case-study on insurance pricing analytics.

MSC:

91G05 Actuarial mathematics

Software:

gamair; smurf; CasANOVA
PDFBibTeX XMLCite
Full Text: DOI arXiv

References:

[1] Akaike, H., A new look at the statistical model identification, IEEE Trans. Automat. Control, 19, 6, 716-723 (1974) · Zbl 0314.62039
[2] Beck, A.; Teboulle, M., A fast iterative shrinkage-thresholding algorithm for linear inverse problems, SIAM J. Imaging Sci., 2, 1, 183-202 (2009) · Zbl 1175.94009
[3] Bondell, H.; Reich, B., Simultaneous factor selection and collapsing levels in ANOVA, Biometrics, 65, 1, 169-177 (2009) · Zbl 1159.62048
[4] Boyd, S.; Vandenberghe, L., Convex Optimization (2004), Cambridge University Press · Zbl 1058.90049
[5] Dawid, A.; Sebastiani, P., Coherent dispersion criteria for optimal experimental design, Ann. Stat., 27, 65-81 (1999) · Zbl 0948.62057
[6] Denuit, M.; Lang, S., Non-life rate-making with Bayesian GAMs, Insurance Math. Econom., 35, 3, 627-647 (2004) · Zbl 1070.62095
[7] Efron, B.; Hastie, T.; I., J.; Tibshirani, R., Least angle regression, Ann. Statist., 32, 2, 407-499 (2004) · Zbl 1091.62054
[8] Eilers, P.; Marx, B., Flexible smoothing with \(B\)-splines and penalties, Statist. Sci., 11, 2, 89-121 (1996) · Zbl 0955.62562
[9] Frees, E.; Meyers, G.; Cummings, A., Insurance ratemaking and a gini index, J. Risk Insurance, 81, 2, 335-366 (2014)
[10] Gabay, D.; Mercier, B., A dual algorithm for the solution of nonlinear variational problems via finite element approximations, Comput. Math. Appl., 2, 1, 17-40 (1976) · Zbl 0352.65034
[11] Gertheiss, J.; Tutz, G., Sparse modeling of categorial explanatory variables, Ann. Appl. Stat., 4, 4, 2150-2180 (2010) · Zbl 1220.62092
[12] Glowinski, R.; Marroco, A., Sur l’approximation, par éléments finis d’ordre un, et la résolution, par pénalisation-dualité, d’une classe de problèmes de Dirichlet non linéaires, Rev. Française Autom. Inf. Recherche Opér. Anal. Numér., 9, R2, 41-76 (1975) · Zbl 0368.65053
[13] Gouvêa, M., Gonçalves, E., 2007. Credit risk analysis applying logistic regression, neural networks and genetic algorithms models. In: POMS 18th Annual Conference. Dallas, Texas.
[14] Hanley, J.; McNeil, B., The meaning and use of the area under a receiver operating characteristic (ROC) curve, Radiology, 143, 1, 29-36 (1982)
[15] Hastie, T.; Tibshirani, R., Generalized additive models, Statist. Sci., 1, 3, 297-310 (1986) · Zbl 0645.62068
[16] Hastie, T.; Tibshirani, R.; Wainwright, M., Statistical Learning with Sparsity: The Lasso and Generalizations (2015), CRC Press · Zbl 1319.68003
[17] Henckaerts, R.; Antonio, K.; Clijsters, M.; Verbelen, R., A data driven binning strategy for the construction of insurance tariff classes, Scand. Actuar. J., 8, 681-705 (2018) · Zbl 1418.91241
[18] Höfling, H.; Binder, H.; Schumacher, M., A coordinate-wise optimization algorithm for the Fused Lasso (2010), Arxiv preprint
[19] Klein, N.; Denuit, M.; Lang, S.; Kneib, T., Nonlife ratemaking and risk management with Bayesian generalized additive models for location, scale, and shape, Insurance Math. Econom., 55, 225-249 (2014) · Zbl 1296.62089
[20] Kohavi, R., 1995. A study of cross-validation and bootstrap for accuracy estimation and model selection. In: Proceedings of the 14th International Joint Conference on Artificial Intelligence, Vol. 2. pp. 1137-1143.
[21] Liu, J., Yuan, L., Jieping, Y., 2010. An efficient algorithm for a class of fused lasso problems. In: Proceedings of the 16th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. pp. 323-332.
[22] Meinshausen, N., Relaxed lasso, Comput. Statist. Data Anal., 52, 1, 374-393 (2007) · Zbl 1452.62522
[23] Nesterov, Y., A method of solving a convex programming problem with convergence rate \(O(1/ k^2)\), Sov. Math. Dokl., 27, 2, 372-376 (1983) · Zbl 0535.90071
[24] Nyquist, H., Restricted estimation of generalized linear models, J. R. Stat. Soc. Ser. B Stat. Methodol., 40, 1, 133-141 (1991) · Zbl 0825.62612
[25] Oelker, M.-R.; Tutz, G., A uniform framework for the combination of penalties in generalized structured models, Adv. Data Anal. Classif., 11, 1, 97-120 (2017) · Zbl 1414.62321
[26] Parikh, N.; Boyd, S., Proximal algorithms, Found. Trends Optim., 1, 3, 123-231 (2013)
[27] Ramdas, A.; Tibshirani, R., Fast and flexible ADMM algorithms for trend filtering, J. Comput. Graph. Statist., 25, 3, 839-858 (2016)
[28] Rinaldo, A., Properties and refinements of the fused lasso, Ann. Statist., 37, 5B, 2922-2952 (2009) · Zbl 1173.62027
[29] Schwarz, G., Estimating the dimension of a model, Ann. Statist., 6, 2, 461-464 (1978) · Zbl 0379.62005
[30] Tibshirani, R., Regression shrinkage and selection via the lasso, J. R. Stat. Soc. Ser. B Stat. Methodol., 58, 1, 267-288 (1996) · Zbl 0850.62538
[31] Tibshirani, R., The Lasso method for variable selection in the Cox model, Stat. Med., 16, 4, 385-395 (1997)
[32] Tibshirani, R.; Saunders, M.; Rosset, S.; Zhu, J.; Knight, K., Sparsity and smoothness via the fused Lasso, J. R. Stat. Soc. Ser. B Stat. Methodol., 67, 1, 91-108 (2005) · Zbl 1060.62049
[33] Tibshirani, R.; Taylor, J., The solution path of the generalized Lasso, Ann. Statist., 39, 3, 1335-1371 (2011) · Zbl 1234.62107
[34] Viallon, V.; Lambert-Lacriox, S.; Höfling, H.; Picard, F., On the robustness of the Generalized Fused Lasso to prior specifications, Stat. Comput., 26, 1, 285-301 (2016) · Zbl 1342.62123
[35] Wahlberg, B., Boyd, S., Annergren, M., Wang, Y., 2012. An ADMM algorithm for a class of total variation regularized estimation problems. In: Proceedings of the 16th IFAC Symposium on System Identification, Vol. 16. pp. 83-88.
[36] Wang, H.; Leng, C., A note on adaptive group lasso, Comput. Statist. Data Anal., 52, 12, 5277-5286 (2008) · Zbl 1452.62524
[37] Witten, I.; Frank, E., Data Mining (1999), Morgan Kaufmann Publishers
[38] Wood, S., Generalized Additive Models: An Introduction with R (2017), Chapman and Hall/CRC · Zbl 1368.62004
[39] Xin, B., Kawahara, Y., Wang, Y., Gao, W., 2014. Efficient generelized fused lasso and its application to the diagnosis of alzheimer’s disease. In: Proceedings of the Twenty-Eighth AAAI Conference on Artificial Intelligence. pp. 2163-2169.
[40] Yuan, M.; Lin, Y., Model selection and estimation in regression with grouped variables, J. R. Stat. Soc. Ser. B Stat. Methodol., 68, 1, 49-67 (2007) · Zbl 1141.62030
[41] Zhu, Y., An augmented ADMM algorithm with application to the generalized lasso problem, J. Comput. Graph. Statist., 26, 1, 195-204 (2017)
[42] Zou, H., The adaptive lasso and its oracle properties, J. Amer. Statist. Assoc., 101, 476, 1418-1429 (2006) · Zbl 1171.62326
[43] Zou, H.; Hastie, T., Regularization and variable selection via the elastic net, J. R. Stat. Soc. Ser. B Stat. Methodol., 67, 2, 301-320 (2005) · Zbl 1069.62054
This reference list is based on information provided by the publisher or from digital mathematics libraries. Its items are heuristically matched to zbMATH identifiers and may contain data conversion errors. In some cases that data have been complemented/enhanced by data from zbMATH Open. This attempts to reflect the references listed in the original paper as accurately as possible without claiming completeness or a perfect matching.