Fat-tailed regression modeling with spliced distributions. (English) Zbl 1417.62299

Summary: Insurance claims data usually contain a large number of zeros and exhibits fat-tail behavior. Misestimation of one end of the tail impacts the other end of the tail of the claims distribution and can affect both the adequacy of premiums and needed reserves to hold. In addition, insured policyholders in a portfolio are naturally non-homogeneous. It is an ongoing challenge for actuaries to be able to build a predictive model that will simultaneously capture these peculiar characteristics of claims data and policyholder heterogeneity. Such models can help make improved predictions and thereby ease the decision-making process. This article proposes the use of spliced regression models for fitting insurance loss data. A primary advantage of spliced distributions is their flexibility to accommodate modeling different segments of the claims distribution with different parametric models. The threshold that breaks the segments is assumed to be a parameter, and this presents an additional challenge in the estimation. Our simulation study demonstrates the effectiveness of using multistage optimization for likelihood inference and at the same time the repercussions of model misspecification. For purposes of illustration, we consider three-component spliced regression models: the first component contains zeros, the second component models the middle segment of the loss data, and the third component models the tail segment of the loss data. We calibrate these proposed models and evaluate their performance using a Singapore auto insurance claims dataset. The estimation results show that the spliced regression model performs better than the Tweedie regression model in terms of tail fitting and prediction accuracy.


62P05 Applications of statistics to actuarial sciences and financial mathematics
91B30 Risk theory, insurance (MSC2010)
62G32 Statistics of extreme values; tail inference


ismev; CompLognormal
Full Text: DOI


[1] Aban, I. B.; Meerschaert, M. M.; Panorska, A. K., Parameter estimation for the truncated Pareto distribution, Journal of the American Statistical Association, 101, 473, 270-7, (2006) · Zbl 1118.62312
[2] Coles, S., An introduction to statistical modeling of extreme values, (2001), London, England: Springer-Verlag, London, England · Zbl 0980.62043
[3] Cooray, K.; Ananda, M. M., Modeling actuarial data with a composite lognormal-Pareto model, Scandinavian Actuarial Journal, 2005, 5, 321-34, (2005) · Zbl 1143.91027
[4] de Jong, P.; Heller, G. Z., Generalized linear models for insurance data, (2008), Cambridge, England: Cambridge University Press · Zbl 1142.91046
[5] Fang, K.; Ma, S., Three-part model for fractional response variables with application to Chinese household health insurance coverage, Journal of Applied Statistics, 40, 5, 925-40, (2013)
[6] Foss, S.; Korshunov, D.; Zachary, S., An introduction to heavy-tailed and subexponential distributions, (2013), New York, NY: Springer-Verlag., New York, NY · Zbl 1274.62005
[7] Frees, E. W, Regression modeling with actuarial and financial applications, (2009), Cambridge University Press
[8] Frees, E. W.; Valdez, E. A., Hierarchical insurance claims modeling, Journal of the American Statistical Association, 103, 484, 1457-69, (2008) · Zbl 1286.62087
[9] Frees, E. W.; Lee, G.; Yang, L., Multivariate frequency-severity regression models in insurance, Risks, 4, 1, 1-36, (2016)
[10] Frees, E. W.; Derrig, R. A.; Meyers, G., Predictive modeling applications in actuarial science: Volume 1, predictive modeling techniques, (2014), Cambridge, England: Cambridge University Press
[11] Friedland, J., Fundamentals of general insurance actuarial analysis, (2014), Schaumburg, IL: Society of Actuaries, Schaumburg, IL
[12] Gan, G.; Valdez, E. A., Regression modeling for the valuation of large variable annuity portfolios, North American Actuarial Journal, 22, 40-54, (2017) · Zbl 1393.91099
[13] Gray, R. J.,; Pitts, S. M., Risk modelling in general insurance: From principles to practice, (2012), Cambridge, England: Cambridge University Press, Cambridge, England · Zbl 1250.91057
[14] Klugman, S.; Panjer, H.; Willmot, G., Loss models: From data to decisions, (2012), Hoboken, NJ: Wiley · Zbl 1272.62002
[15] Klugman, S. A.; Panjer, H. H.; Willmot, G. E., Loss models: Further topics, (2014), Hoboken, NJ: Wiley, Hoboken, NJ · Zbl 1273.62008
[16] Koenker, R., Quantile regression, (2005), Cambridge, England: Cambridge University Press, Cambridge, England · Zbl 1111.62037
[17] Kotz, S.; Nadarajah, S., Extreme value distributions: Theory and applications, (2000), London, England: Imperial College Press, London, England · Zbl 0960.62051
[18] Kudryavtsev, A. A., Using quantile regression for rate-making, Insurance: Mathematics and Economics, 45, 2, 296-304, (2009) · Zbl 1231.91204
[19] Lee, S. C. K.; Lin, X. S., Modeling and evaluating insurance losses via mixtures of Erlang distributions, North American Actuarial Journal, 14, 1, 107-30, (2010)
[20] McCullagh, P.; Nelder, J. A., Generalized linear models, (1989), Boca Raton, FL: Chapman and Hall/CRC, Boca Raton, FL · Zbl 0744.62098
[21] McLachlan, G.; Peel, D., Finite mixture models, (2000), Hoboken, NJ: Wiley, Hoboken, NJ · Zbl 0963.62061
[22] Miljkovic, T.; Grün, B., Modeling loss data using mixtures of distributions, Insurance: Mathematics and Economics, 70, 387-96, (2016) · Zbl 1373.62527
[23] Millar, R. B, Maximum likelihood estimation and inference: With examples in R, SAS and ADMB, (2011), West Sussex, England: Wiley, West Sussex, England · Zbl 1273.62012
[24] Nadarajah, S.; Bakar, S., New composite models for the Danish fire insurance data, Scandinavian Actuarial Journal, 2014, 2, 180-7, (2014) · Zbl 1401.91177
[25] Nadarajah, S.; Bakar, S. A. A., CompLognormal: An R package for composite lognormal distributions, R Journal, 5, 2, 98-104, (2013)
[26] Ohlsson, E.; Johansson, B., Non-life insurance pricing with generalized linear models, (2010), Berlin: Springer, Berlin · Zbl 1194.91011
[27] Panjer, H. H, Operational risk: Modeling analytics, (2006), Hoboken, NJ: Wiley, Hoboken, NJ · Zbl 1258.62101
[28] Parodi, P., Pricing in general insurance, (2014), Boca Raton, FL: CRC Press, Boca Raton, FL
[29] Peters, G. W.; Shevchenko, P. V., Advances in heavy tailed risk modeling: A handbook of operational risk. Wiley handbooks in financial engineering and econometrics, (2015), Hoboken, NJ: Wiley, Hoboken, NJ
[30] Pigeon, M.; Denuit, M., Composite lognormal-Pareto model with random threshold, Scandinavian Actuarial Journal, 2011, 3, 177-92, (2011) · Zbl 1277.62258
[31] Scollnik, D. P. M., On composite lognormal-Pareto models, Scandinavian Actuarial Journal, 2007, 1, 20-33, (2007) · Zbl 1146.91028
[32] Shi, P., Predictive modeling applications in actuarial science. Volume I: Predictive modeling techniques, Fat-tailed regression models, 236-59, (2014), Cambridge, England: Cambridge University Press, Cambridge, England
[33] Smyth, G. K.; Jørgensen, B., Fitting Tweedie’s compound Poisson model to insurance claims data: Dispersion modelling, ASTIN Bulletin, 32, 1, 143-57, (2002) · Zbl 1094.91514
[34] Vernic, R.; Teodorescu, S.; Pelican, E., Two lognormal models for real data, Annals of Ovidius University, Series Mathematics, 17, 3, 263-77, (2009) · Zbl 1199.60043
[35] Wang, X.; Dey, D. K., Generalized extreme value regression for binary response data: An application to b2b electronic payments system adoption, Annals of Applied Statistics, 4, 4, 2000-23, (2010) · Zbl 1220.62165
[36] Yin, C.; Lin, X. S., Efficient estimation of Erlang mixtures using Iscad penalty with insurance application, ASTIN Bulletins, 46, 3, 779-99, (2016) · Zbl 1390.62030
[37] The authors reserve the right to reply to any discussion
This reference list is based on information provided by the publisher or from digital mathematics libraries. Its items are heuristically matched to zbMATH identifiers and may contain data conversion errors. It attempts to reflect the references listed in the original paper as accurately as possible without claiming the completeness or perfect precision of the matching.