Valid model-free prediction of future insurance claims. (English) Zbl 07469929

The authors consider conformal prediction as a model-free method to provide provably valid predictions for the size of future insurance claims uniformly over all sample sizes and all distributions.
Let \(X_{n}\) \((n \in \mathbb{N})\) be real-valued random variables such that the joint distribution of any finite subsequence is permutation invariant. For \(M\colon [\mathbb{R}]^{<\infty} \times \mathbb{R} \rightarrow \mathbb{R}\), called nonconformity measure, given a set of observations \(\mathcal{X}_{n} = \{x_{0},x_{1},\dots, x_{n-1}\}\), the plausibility function \(\textrm{pl}_{\mathcal{X}_{n}}\colon \mathbb{R} \rightarrow \mathbb{R}\) is calculated as follows:
Let \(x \in \mathbb{R}\) and set \(x_{n} = x\), \(\mathcal{X}_{n+1} = \mathcal{X}_{n}\cup \{x\}\).
For every \(i\leq n\), set \(m_{i} = M(\mathcal{X}_{n+1}\setminus \{x_{i}\},x_{i})\).
Set \(\textrm{pl}_{\mathcal{X}_{n}}(x) = \frac{1}{n+1}\sum_{i\leq n} \mathbb{I}_{[m_{i}\geq m_{n}]}\).
The authors note that, in non-degenerate cases, \(\textrm{pl}_{\mathcal{X}_{n}}(X_{n+1})\) is uniformly distributed on \(\{1/(n+1),2/(n+1),\dots,1\}\). Hence, for every \(\alpha \in (0,1)\), \[C_{\alpha}(\mathcal{X}_{n}) = \{x\colon \textrm{pl}_{\mathcal{X}_{n}}(x) > \lfloor(n+1)\alpha\rfloor/(n+1) \}\] satisfies that the probability of \(X_{n+1}\notin C_{\alpha}(\mathcal{X}_{n}) \) does not exceed \(\alpha\), which makes \(C_{\alpha}(\mathcal{X}_{n})\) the \(100(1-\alpha)\%\) conformal prediction region.
The authors calculate \(m_{i}\) explicitly in the case \(M\) is the sample cumulative density function of the form \(M(\mathcal{X}_{n},x) = \frac{1}{n}\sum_{i<n}K(x,x_{i},h)\) where \(K(\cdot,\theta,h)\) is a kernel distribution function with parameter \(\theta \in \mathbb{R}\) and bandwidth \(h\), and observe that if \(x \mapsto K(x,x,h)\) is constant then \( \textrm{pl}_{\mathcal{X}_{n}}(x) = \frac{1}{n}\sum_{i<n}\mathbb{I}_{[x_{i}\geq x]}\) and \(C_{\alpha}(\mathcal{X}_{n}) = [0,x_{(k)})\) where \(x_{(k)}\) is the \(k^{\textrm{th}}\) largest element of \(\mathcal{X}_{n}\) with \(k = \min\{n,\lfloor (n+1)(1-\alpha)\rfloor+1\}\).
The authors use fire claims data and car injury claims data to calculate conformal prediction intervals and compare them to prediction intervals based on other non-parametric methods. Possible extensions of the method to conditional predictions are also considered.


91G05 Actuarial mathematics
62G30 Order statistics; empirical distribution functions
60G25 Prediction theory (aspects of stochastic processes)
62M20 Inference from stochastic processes and prediction
62G05 Nonparametric estimation
62G07 Density estimation


Full Text: DOI


[1] Brazauskas, Y.,; Kleefeld., A., Modeling severity and measuring tail risk of Norwegian fire claims, North American Actuarial Journal, 20, 1, 1-16 (2016) · Zbl 1414.62415
[2] Cella, L.; Martin, R. (2019)
[3] de Jong, P.; Heller, P. Z., Generalized linear models for insurance data (2008), Cambridge, UK: Cambridge University Press, Cambridge, UK · Zbl 1142.91046
[4] Fligner, M.; Wolfe., D. A., Some applications of sample analogues to the probability integral transformation and a coverage property, The American Statistician, 30, 2, 78-85 (1976) · Zbl 0369.62015
[5] Foygel-Barber, R.; Candes, E. J.; Ramadas, A.; Tibshirani., R. J., Conformal prediction under covariate shift, arXiv:1904.06019 (2019)
[6] Foygel-Barber, R.; Candes, E. J.; Ramadas, A.; Tibshirani., R. J., The limits of distribution-free conditional predictive inference (2019)
[7] Frees, E. W.; Derrig, R. A.; Meyers., G., Predictive modeling applications in actuarial science: Vol. I. Predictive modeling techniques (2014), Cambridge: Cambridge University Press, Cambridge
[8] Frey, J., Data-driven nonparametric prediction intervals, Journal of Statistical Planning and Inference, 143, 1039-48 (2013) · Zbl 1428.62183
[9] Gan, G.; Valdez., E. A., Data clustering with actuarial applications, North American Actuarial Journal (2019) · Zbl 1454.91186
[10] Ghahari, A.; Newlands, N. K.; Lyubchich, V.; Gel., Y. R., Deep learning at the interface of agricultural insurance risk and spatio-temporal uncertainty in weather extremes, North American Actuarial Journal, 23, 4, 535-50 (2019) · Zbl 1429.91278
[11] Guan, L., Conformal prediction with localization, arXiv:1908.08558 (2019)
[12] Hong, L.; Kuffner, T.; Martin., R., On prediction of future insurance claims when the model is uncertain, Variance, 12, 1, 90-99 (2018)
[13] Hong, L.; Martin., R., A flexible Bayesian nonparametrics model for predicting future insurance claims, North American Actuarial Journal, 21, 228-41 (2017) · Zbl 1414.91201
[14] Hong, L.; Martin., R., Dirichlet process mixture models for insurance loss data, Scandinavian Actuarial Journal, 6, 545-54 (2018) · Zbl 1416.91188
[15] Hong, L.; Martin., R., Real-time Bayesian nonparametric prediction of solvency risk, Annals of Actuarial Science, 13, 67-79 (2019)
[16] Hong, L.; Martin, R., Model misspecification, Bayesian versus credibility estimation, and Gibbs posteriors, Scandinavian Actuarial Journal (2020) · Zbl 1448.91261
[17] Jeon, Y.; Kim., J. H. T., A gamma kernel density estimation for insurance loss data, Insurance: Mathematics and Economics, 53, 569-79 (2013) · Zbl 1290.62099
[18] Kallenberg, O., Foundations of modern probability (2002), New York: Springer, New York · Zbl 0996.60001
[19] Kleijn, B. J. K.; Van der Vaart., A. W., The Bernstein-Von-Mises theorem under misspecification, Electronic Journal of Statistics, 6, 354-81 (2012) · Zbl 1274.62203
[20] Klugman, S. A.; Panjer, H. H.; Willmot, G. E., Loss models: From data to decisions (2008), Hoboken, NJ: Wiley, Hoboken, NJ · Zbl 1159.62070
[21] Lee, S. C. K.; Lin., X. S., Delta boosting machine with application to general insurance, North American Actuarial Journal, 22, 3, 405-25 (2018) · Zbl 1416.91199
[22] Lei, J.; Wasserman., L., Distribution-free prediction bands for nonparametric regression, Journal of Royal Statistical Society-Series B, 76, 71-96 (2014) · Zbl 1411.62103
[23] Liu, K.; Tan, K. S., Real-time valuation of large variable annuity portfolios: A Green Mesh Approach, North American Actuarial Journal (2020)
[24] Lockhart, R.; Taylor, J.; Tibshirani, R. J.; Tibshirani., R., A significance test for the lasso, Annals of Statistics, 42, 2, 413-68 (2014) · Zbl 1305.62254
[25] Martin, R., A statistical inference course based on p-values, The American Statistician, 71, 128-36 (2017)
[26] Martin, R., False confidence, non-additive beliefs, and valid statistical inference, International Journal of Approximate Reasoning, 113, 39-73 (2019) · Zbl 1471.62236
[27] Martin, R.; Lingham., R. T., Prior-free probabilistic prediction of future observations, Technometrics, 58, 2, 226-35 (2016)
[28] Martin, R.; Liu., C., Inferential models: A framework for prior-free posterior probabilistic inference, Journal of the American Statistical Association, 108, 501, 301-13 (2013) · Zbl 06158344
[29] Martin, R.; Liu., C., A note on p-values interpreted as plausibilities, Statistica Sinica, 24, 1703-16 (2014) · Zbl 1480.62010
[30] Martin, R.; Liu, C., Inferential models: Reasoning with uncertainty (2015), Boca Raton, FL: Chapman & Hall/CRC Press
[31] Mdziniso, N. C.; Cooray., K., Odd Pareto families of distributions for modeling loss payment data, Scandinavian Actuarial Journal, 1, 42-63 (2018) · Zbl 1416.91208
[32] Norwegian fire claims data. 1990. Accessed October 12, 2019.
[33] Rempala, G. A.; Derrig., R. A., Modeling hidden exposures in claim severity via the EM algorithm, North American Actuarial Journal, 9, 2, 108-28 (2005) · Zbl 1085.62515
[34] Schervish, M. J., Theory of statistics (1995), New York: Springer, New York · Zbl 0834.62002
[35] Shafer, G.; Vovk, V., A tutorial on conformal prediction, Journal of Machine Learning, 9, 371-421 (2008) · Zbl 1225.68215
[36] Sheather, S., Density estimation, Statistical Science, 19, 588-597 (2004) · Zbl 1100.62558
[37] Solvency II (2009)
[38] Syring, N.; Hong, L.; Martin., R., Gibbs posterior inference on value-at-risk, Scandinavian Actuarial Journal, 2019, 7, 548-57 (2019) · Zbl 1422.91376
[39] Vovk, V., Conditional validity of inductive conformal predictors, Machine Learning, 92, 475-90 (2013) · Zbl 1273.68307
[40] Vovk, V.; Gammerman, A.; Shafer, G., Algorithmic learning in a random world (2005), New York: Springer, New York · Zbl 1105.68052
[41] Werner, G.; Modlin, C., Basic ratemaking (2010), Arlington, VA: Casualty Actuarial Society, Arlington, VA
[42] Wilks, W., Determination of sample sizes for setting tolerance limits, Annals of Mathematical Statistics, 12, 91-96 (1941) · JFM 67.0481.04
This reference list is based on information provided by the publisher or from digital mathematics libraries. Its items are heuristically matched to zbMATH identifiers and may contain data conversion errors. It attempts to reflect the references listed in the original paper as accurately as possible without claiming the completeness or perfect precision of the matching.