Mixture modeling of data with multiple partial right-censoring levels. (English) Zbl 1459.62183

Summary: In this paper, a new flexible approach to modeling data with multiple partial right-censoring points is proposed. This method is based on finite mixture models, flexible tool to model heterogeneity in data. A general framework to accommodate partial censoring is considered. In this setting, it is assumed that a certain portion of data points are censored and the rest are not. This situation occurs in many insurance loss data sets. A novel probability function is proposed to be used as a mixture component and the expectation-maximization algorithm is employed for estimating model parameters. The Bayesian information criterion is used for model selection. Additionally, an approach for the variability assessment of parameter estimates as well as the computation of quantiles commonly known as risk measures is considered. The proposed model is evaluated using a simulation study based on four common probability distribution functions used to model right skewed loss data and applied to a real data set with good results.


62N01 Censored data models
62H30 Classification and discrimination; cluster analysis (statistical aspects)
62P05 Applications of statistics to actuarial sciences and financial mathematics
91G05 Actuarial mathematics


R; ltmix
Full Text: DOI


[1] Bakar, SA A.; Hamzaha, N. A.; Maghsoudia, M.; Nadarajah, S., Modeling loss data using composite models, Insur Math Econ, 61, 146-154 (2015) · Zbl 1314.91130
[2] Balakrishnan, N.; Mitra, D., Likelihood inference for lognormal data with left truncation and right censoring with an illustration, J Stat Plan Inference, 141, 3536-3553 (2011) · Zbl 1221.62038
[3] Balakrishnan, N.; Mitra, D., Left truncated and right censored Weibull data and likelihood inference with an illustration, Comput Stat Data Anal, 56, 4011-4025 (2012) · Zbl 1255.62309
[4] Balakrishnan, N.; Mitra, D., Likelihood inference based on left truncated and right censored data from a gamma distribution, IEEE Trans Reliab, 62, 679-688 (2013)
[5] Bang, S.; Cho, H.; Jhun, M., Simultaneous estimation for non-crossing multiple quantile regression with right censored data, Statistics and Computing, 26, 131-147 (2016) · Zbl 1342.62114
[6] Beirlant, J.; Goegebeur, Y.; Teugels, J.; Segers, J., Statistics of Extremes (2004), Hobuken, NJ: Wiley, Hobuken, NJ
[7] Biernacki, C.; Celeux, G.; Govaert, G., Choosing starting values for the EM algorithm for getting the highest likelihood in multivariate Gaussian mixture models, Comput Stat Data Anal, 413, 561-575 (2003) · Zbl 1429.62235
[8] Blostein M, Miljkovic T (2019a) ltmix: Left-Truncated Mixtures of Gamma. Weibull, and Lognormal Distributions, r package version (2) · Zbl 1415.62076
[9] Blostein, M.; Miljkovic, T., On modeling left-truncated loss data using mixtures of distributions, Insur Math Econ, 85, 35-46 (2019) · Zbl 1415.62076
[10] Bordes, L.; Chauveau, D., Stochastic EM algorithms for parametric and semiparametric mixture models for right-censored lifetime data, Comput Stat, 31, 1513-1538 (2016) · Zbl 1348.65016
[11] Calderín-Ojeda, E.; Kwok, CF, Modeling claims data with composite stoppa models, Scandinavian Actuarial Journal, 9, 817-836 (2016) · Zbl 1401.62205
[12] Chauveau, D., ‘A stochastic EM algorithm for mixture with censored data, J Stat Plan, 46, 1-25 (1995) · Zbl 0821.62013
[13] Coorey, K.; Ananda, MM, Modeling actuarial data with a composite Lognormal-Pareto model, Scandinavian Actuarial Journal, 5, 321-334 (2005) · Zbl 1143.91027
[14] Frees, E.; Valdez, E., Understanding relationships using copulas, N Am Actuar J, 2, 1-15 (1998) · Zbl 1081.62564
[15] Gruen B, Leisch F, Sarkar D, Mortier F (2019) ltmix: Left-Truncated Mixtures of Gamma, Weibull, and Lognormal Distributions, r package version 2.3-15
[16] Gui, W.; Huang, R.; Lin, XS, Fitting the Erlang mixture model to data via a GEM-CMM algorithm, J Comput Appl Math, 343, 189-205 (2018) · Zbl 06892263
[17] Hoeting, JA; Madigan, D.; Raftery, AE; Volinsky, CT, Bayesian model averaging: a tutorial, Stat Sci, 14, 382-401 (1999) · Zbl 1059.62525
[18] Hubert, L.; Arabie, P., Comparing partitions, J Classif, 2, 193-218 (1985)
[19] Klugman, S. A.; Panjer, H. H.; Willmot, G. E., Loss Models: From Data to Decisions (2012), Hobuken, NJ: Wiley, Hobuken, NJ · Zbl 1272.62002
[20] Klugman, S. A.; Parsa, R., Fitting bivariate loss distribution with copulas, Insur Math Econ, 24, 139-148 (1999) · Zbl 0931.62044
[21] Lee, G.; Scott, C., EM algorithms for multivariate Gaussian mixture models with truncated and censored data, Comput Stat Data Anal, 56, 2816-2829 (2012) · Zbl 1255.62308
[22] Lee, SCK; Lin, XS, Modeling and evaluating insurance losses via mixtures of Erlang distributions, N Am Actuar J, 14, 107-130 (2010)
[23] McLachlan, G.; Jones, SAA, Fitting mixture models to grouped and truncated data via the EM algorithm, Biometrics, 22, 571-578 (1988) · Zbl 0707.62214
[24] McLachlan, G.; Peel, D., Finite mixture models (2000), Hobuken, NJ: Wiley, Hobuken, NJ · Zbl 0963.62061
[25] McNeil, A., Estimating the tails of loss severity distributions using extreme value theory, ASTIN Bull, 27, 117-137 (1997)
[26] Melnykov, V.; Michael, S.; Melnykov, I.; Celebi, ME, Recent developments in model-based clustering with applications, Partitional clustering algorithms, 1-39 (2015), Berlin: Springer, Berlin
[27] Michael, S.; Melnykov, V., An effective strategy for initializing the EM algorithm in finite mixture models, Adv Data Anal Classif, 10, 563-583 (2016) · Zbl 1414.62256
[28] Miljkovic, T.; Grün, B., Modeling loss data using mixtures of distributions, Insur Math Econ, 70, 387-396 (2016) · Zbl 1373.62527
[29] Pigeon, M.; Denuit, M., Composite Lognormal-Pareto Model with random threshold, Scandinavian Actuarial Journal, 3, 177-192 (2011) · Zbl 1277.62258
[30] R Core Team (2016) R: A Language and Environment for Statistical Computing. R Foundation for Statistical Computing, Vienna, Austria
[31] Resnick, SI, Discussion of the Danish data on large fire insurance losses, ASTIN Bull, 27, 139-151 (1997)
[32] Ross, S. M., Introduction to probability models (2014), New York: Academic Press, New York · Zbl 1284.60002
[33] Schwarz, G., Estimating the dimension of a model, Ann Stat, 6, 461-464 (1978) · Zbl 0379.62005
[34] Scollnik, DP, On composite Lognormal-Pareto models, Scan Actuar J, 1, 20-33 (2007) · Zbl 1146.91028
[35] Sun, Z.; Ye, X.; Sun, L., Consistent test for parametric models with right-censored data using projections, Comput Stat Data Anal, 118, 112-125 (2018) · Zbl 1469.62150
[36] Verbelen, R.; Gong, L.; Antonio, K.; Badescu, A.; Lin, S., Fitting mixtures of Erlangs to censored and truncated data using the EM algorithm, ASTIN Bull, 45, 729-758 (2015) · Zbl 1390.62227
This reference list is based on information provided by the publisher or from digital mathematics libraries. Its items are heuristically matched to zbMATH identifiers and may contain data conversion errors. It attempts to reflect the references listed in the original paper as accurately as possible without claiming the completeness or perfect precision of the matching.