## Valid model-free prediction of future insurance claims.(English)Zbl 07469929

The authors consider conformal prediction as a model-free method to provide provably valid predictions for the size of future insurance claims uniformly over all sample sizes and all distributions.
Let $$X_{n}$$ $$(n \in \mathbb{N})$$ be real-valued random variables such that the joint distribution of any finite subsequence is permutation invariant. For $$M\colon [\mathbb{R}]^{<\infty} \times \mathbb{R} \rightarrow \mathbb{R}$$, called nonconformity measure, given a set of observations $$\mathcal{X}_{n} = \{x_{0},x_{1},\dots, x_{n-1}\}$$, the plausibility function $$\textrm{pl}_{\mathcal{X}_{n}}\colon \mathbb{R} \rightarrow \mathbb{R}$$ is calculated as follows:
1.
Let $$x \in \mathbb{R}$$ and set $$x_{n} = x$$, $$\mathcal{X}_{n+1} = \mathcal{X}_{n}\cup \{x\}$$.
2.
For every $$i\leq n$$, set $$m_{i} = M(\mathcal{X}_{n+1}\setminus \{x_{i}\},x_{i})$$.
3.
Set $$\textrm{pl}_{\mathcal{X}_{n}}(x) = \frac{1}{n+1}\sum_{i\leq n} \mathbb{I}_{[m_{i}\geq m_{n}]}$$.
The authors note that, in non-degenerate cases, $$\textrm{pl}_{\mathcal{X}_{n}}(X_{n+1})$$ is uniformly distributed on $$\{1/(n+1),2/(n+1),\dots,1\}$$. Hence, for every $$\alpha \in (0,1)$$, $C_{\alpha}(\mathcal{X}_{n}) = \{x\colon \textrm{pl}_{\mathcal{X}_{n}}(x) > \lfloor(n+1)\alpha\rfloor/(n+1) \}$ satisfies that the probability of $$X_{n+1}\notin C_{\alpha}(\mathcal{X}_{n})$$ does not exceed $$\alpha$$, which makes $$C_{\alpha}(\mathcal{X}_{n})$$ the $$100(1-\alpha)\%$$ conformal prediction region.
The authors calculate $$m_{i}$$ explicitly in the case $$M$$ is the sample cumulative density function of the form $$M(\mathcal{X}_{n},x) = \frac{1}{n}\sum_{i<n}K(x,x_{i},h)$$ where $$K(\cdot,\theta,h)$$ is a kernel distribution function with parameter $$\theta \in \mathbb{R}$$ and bandwidth $$h$$, and observe that if $$x \mapsto K(x,x,h)$$ is constant then $$\textrm{pl}_{\mathcal{X}_{n}}(x) = \frac{1}{n}\sum_{i<n}\mathbb{I}_{[x_{i}\geq x]}$$ and $$C_{\alpha}(\mathcal{X}_{n}) = [0,x_{(k)})$$ where $$x_{(k)}$$ is the $$k^{\textrm{th}}$$ largest element of $$\mathcal{X}_{n}$$ with $$k = \min\{n,\lfloor (n+1)(1-\alpha)\rfloor+1\}$$.
The authors use fire claims data and car injury claims data to calculate conformal prediction intervals and compare them to prediction intervals based on other non-parametric methods. Possible extensions of the method to conditional predictions are also considered.

### MSC:

 91G05 Actuarial mathematics 62G30 Order statistics; empirical distribution functions 60G25 Prediction theory (aspects of stochastic processes) 62M20 Inference from stochastic processes and prediction 62G05 Nonparametric estimation 62G07 Density estimation

covTest
Full Text:

### References:

  Brazauskas, Y.,; Kleefeld., A., Modeling severity and measuring tail risk of Norwegian fire claims, North American Actuarial Journal, 20, 1, 1-16 (2016) · Zbl 1414.62415  Cella, L.; Martin, R. (2019)  de Jong, P.; Heller, P. Z., Generalized linear models for insurance data (2008), Cambridge, UK: Cambridge University Press, Cambridge, UK · Zbl 1142.91046  Fligner, M.; Wolfe., D. A., Some applications of sample analogues to the probability integral transformation and a coverage property, The American Statistician, 30, 2, 78-85 (1976) · Zbl 0369.62015  Foygel-Barber, R.; Candes, E. J.; Ramadas, A.; Tibshirani., R. J., Conformal prediction under covariate shift, arXiv:1904.06019 (2019)  Foygel-Barber, R.; Candes, E. J.; Ramadas, A.; Tibshirani., R. J., The limits of distribution-free conditional predictive inference (2019)  Frees, E. W.; Derrig, R. A.; Meyers., G., Predictive modeling applications in actuarial science: Vol. I. Predictive modeling techniques (2014), Cambridge: Cambridge University Press, Cambridge  Frey, J., Data-driven nonparametric prediction intervals, Journal of Statistical Planning and Inference, 143, 1039-48 (2013) · Zbl 1428.62183  Gan, G.; Valdez., E. A., Data clustering with actuarial applications, North American Actuarial Journal (2019) · Zbl 1454.91186  Ghahari, A.; Newlands, N. K.; Lyubchich, V.; Gel., Y. R., Deep learning at the interface of agricultural insurance risk and spatio-temporal uncertainty in weather extremes, North American Actuarial Journal, 23, 4, 535-50 (2019) · Zbl 1429.91278  Guan, L., Conformal prediction with localization, arXiv:1908.08558 (2019)  Hong, L.; Kuffner, T.; Martin., R., On prediction of future insurance claims when the model is uncertain, Variance, 12, 1, 90-99 (2018)  Hong, L.; Martin., R., A flexible Bayesian nonparametrics model for predicting future insurance claims, North American Actuarial Journal, 21, 228-41 (2017) · Zbl 1414.91201  Hong, L.; Martin., R., Dirichlet process mixture models for insurance loss data, Scandinavian Actuarial Journal, 6, 545-54 (2018) · Zbl 1416.91188  Hong, L.; Martin., R., Real-time Bayesian nonparametric prediction of solvency risk, Annals of Actuarial Science, 13, 67-79 (2019)  Hong, L.; Martin, R., Model misspecification, Bayesian versus credibility estimation, and Gibbs posteriors, Scandinavian Actuarial Journal (2020) · Zbl 1448.91261  Jeon, Y.; Kim., J. H. T., A gamma kernel density estimation for insurance loss data, Insurance: Mathematics and Economics, 53, 569-79 (2013) · Zbl 1290.62099  Kallenberg, O., Foundations of modern probability (2002), New York: Springer, New York · Zbl 0996.60001  Kleijn, B. J. K.; Van der Vaart., A. W., The Bernstein-Von-Mises theorem under misspecification, Electronic Journal of Statistics, 6, 354-81 (2012) · Zbl 1274.62203  Klugman, S. A.; Panjer, H. H.; Willmot, G. E., Loss models: From data to decisions (2008), Hoboken, NJ: Wiley, Hoboken, NJ · Zbl 1159.62070  Lee, S. C. K.; Lin., X. S., Delta boosting machine with application to general insurance, North American Actuarial Journal, 22, 3, 405-25 (2018) · Zbl 1416.91199  Lei, J.; Wasserman., L., Distribution-free prediction bands for nonparametric regression, Journal of Royal Statistical Society-Series B, 76, 71-96 (2014) · Zbl 1411.62103  Liu, K.; Tan, K. S., Real-time valuation of large variable annuity portfolios: A Green Mesh Approach, North American Actuarial Journal (2020)  Lockhart, R.; Taylor, J.; Tibshirani, R. J.; Tibshirani., R., A significance test for the lasso, Annals of Statistics, 42, 2, 413-68 (2014) · Zbl 1305.62254  Martin, R., A statistical inference course based on p-values, The American Statistician, 71, 128-36 (2017)  Martin, R., False confidence, non-additive beliefs, and valid statistical inference, International Journal of Approximate Reasoning, 113, 39-73 (2019) · Zbl 1471.62236  Martin, R.; Lingham., R. T., Prior-free probabilistic prediction of future observations, Technometrics, 58, 2, 226-35 (2016)  Martin, R.; Liu., C., Inferential models: A framework for prior-free posterior probabilistic inference, Journal of the American Statistical Association, 108, 501, 301-13 (2013) · Zbl 06158344  Martin, R.; Liu., C., A note on p-values interpreted as plausibilities, Statistica Sinica, 24, 1703-16 (2014) · Zbl 1480.62010  Martin, R.; Liu, C., Inferential models: Reasoning with uncertainty (2015), Boca Raton, FL: Chapman & Hall/CRC Press  Mdziniso, N. C.; Cooray., K., Odd Pareto families of distributions for modeling loss payment data, Scandinavian Actuarial Journal, 1, 42-63 (2018) · Zbl 1416.91208  Norwegian fire claims data. 1990. Accessed October 12, 2019.  Rempala, G. A.; Derrig., R. A., Modeling hidden exposures in claim severity via the EM algorithm, North American Actuarial Journal, 9, 2, 108-28 (2005) · Zbl 1085.62515  Schervish, M. J., Theory of statistics (1995), New York: Springer, New York · Zbl 0834.62002  Shafer, G.; Vovk, V., A tutorial on conformal prediction, Journal of Machine Learning, 9, 371-421 (2008) · Zbl 1225.68215  Sheather, S., Density estimation, Statistical Science, 19, 588-597 (2004) · Zbl 1100.62558  Solvency II (2009)  Syring, N.; Hong, L.; Martin., R., Gibbs posterior inference on value-at-risk, Scandinavian Actuarial Journal, 2019, 7, 548-57 (2019) · Zbl 1422.91376  Vovk, V., Conditional validity of inductive conformal predictors, Machine Learning, 92, 475-90 (2013) · Zbl 1273.68307  Vovk, V.; Gammerman, A.; Shafer, G., Algorithmic learning in a random world (2005), New York: Springer, New York · Zbl 1105.68052  Werner, G.; Modlin, C., Basic ratemaking (2010), Arlington, VA: Casualty Actuarial Society, Arlington, VA  Wilks, W., Determination of sample sizes for setting tolerance limits, Annals of Mathematical Statistics, 12, 91-96 (1941) · JFM 67.0481.04
This reference list is based on information provided by the publisher or from digital mathematics libraries. Its items are heuristically matched to zbMATH identifiers and may contain data conversion errors. It attempts to reflect the references listed in the original paper as accurately as possible without claiming the completeness or perfect precision of the matching.