Cost-sensitive multi-class AdaBoost for understanding driving behavior based on telematics. (English) Zbl 1480.91243

Summary: Using telematics technology, insurers are able to capture a wide range of data to better decode driver behavior, such as distance traveled and how drivers brake, accelerate, or make turns. Such additional information also helps insurers improve risk assessments for usage-based insurance, a recent industry innovation. In this article, we explore the integration of telematics information into a classification model to determine driver heterogeneity. For motor insurance during a policy year, we typically observe a large proportion of drivers with zero accidents, a lower proportion with exactly one accident, and a far lower proportion with two or more accidents. We here introduce a cost-sensitive multi-class adaptive boosting (AdaBoost) algorithm we call SAMME.C2 to handle such class imbalances. We calibrate the algorithm using empirical data collected from a telematics program in Canada and demonstrate an improved assessment of driving behavior using telematics compared with traditional risk variables. Using suitable performance metrics, we show that our algorithm outperforms other learning models designed to handle class imbalances.


91G05 Actuarial mathematics
Full Text: DOI


[1] Ayuso, M., Guillen, M. and Nielsen, J.P. (2019) Improving automobile insurance ratemaking using telematics: incorporating mileage and driver behaviour data. Transportation46, 735-752.
[2] Ayuso, M., Guillen, M. and Pérez-Marín, A.M. (2016) Telematics and gender discrimination: some usage-based evidence on whether men’s risk of accidents differs from women’s. Risks4, 1-10.
[3] Bhowan, U., Zhang, M. and Johnston, M. (2010) Genetic programming for classification with unbalanced data. Proceedings 13th European Conference on Genetic Programming, EuroGP 2010, pp. 1-13. Springer-Verlag Berlin.
[4] Boucher, J.-P., Côté, S. and Guillen, M. (2017) Exposure as duration and distance in telematics motor insurance using generalized additive models. Risks5, 1-23.
[5] Chawla, N.V., Bowyer, K.W., Hall, L.O. and Kegelmeyer, W.P. (2002) SMOTE: Synthetic minority over-sampling technique. Journal of Artificial Intelligence Research16, 321-357. · Zbl 0994.68128
[6] Chawla, N.V., Lazarevic, A., Hall, L.O. and Bowyer, K.W. (2003) SMOTEBoost: Improving prediction of the minority class in boosting. PKDD 2003: Proceedings of the Seventh European Conference on Principles and Practice of Knowledge Discovery, pp. 107-119. Springer-Verlag: Berlin-Heidelberg.
[7] Constantinescu, C.C., Stancu, I. and Panait, I. (2018) Impact study of telematics auto insurance. Review of Financial Studies3(4), 17-35.
[8] Douzas, G., Bacao, F. and Last, F. (2018) Improving imblanced learning through a heuristic oversampling method based on k-means and smote. Information Sciences465, 1-20.
[9] Fernández, A., García, S., Galar, M., Prati, R.C., Krawczyk, B. and Herrera, F. (2018). Learning from Imbalanced Data Sets. Switzerland: Springer.
[10] Ferrario, A. and Hämmerli, R. (2019) On Boosting: Theory and Applications. Available at SSRN: https://papers.ssrn.com/sol3/papers.cfm?abstract_id=3402687
[11] Ferreira, A.J. and Figueiredo, M.A. (2012) Boosting algorithms: A review of methods, theory, and applications. In Ensemble Machine Learning: Methods and Applications (eds. Zhang, C. and Ma, Y., chap. 2, pp. 35-85. Springer Science.
[12] Fowlkes, E.B. and Mallows, C. (1983) A method for comparing two hierarchical clusterings. Journal of the American Statistical Association78(383), 553-569. · Zbl 0545.62042
[13] Freund, Y. and Schapire, R.E. (1997) A decision-theoretic generalization of on-line learning and an application to boosting. Journal of Computer and System Sciences55(1), 119-139. · Zbl 0880.68103
[14] Friedman, J., Hastie, T. and Tibshirani, R. (2000) Additive logistic regression: A statistical view of boosting. The Annals of Statistics28(2), 337-407. · Zbl 1106.62323
[15] Galar, M., Fernández, A., Barrenechea, E., Bustince, H. and Herrer, F. (2012) A review on emsembles for the class imbalance problem: Bagging-, boosting-, and hybrid-based approaches. IEEE Transactions on Systems, Man, and Cybernetics - Part C: Applications and Review42(4), 463-484.
[16] Gao, G., Meng, S. and Wüthrich, M.V. (2019) Claims frequency modeling using telematics card driving data. Scandinavian Actuarial Journal2, 143-162. · Zbl 1411.91280
[17] Gao, G., Wang, H. and Wüthrich, M.V. (2021) Boosting poisson regression models with telematics car driving data. Machine Learning.
[18] Guillen, M., Nielsen, J.P., Pérez-Marín, A.M. and Elpidorou, V. (2020) Can automobile insurance telematics predict the risk of near-miss events?North American Actuarial Journal24(1), 141-152. · Zbl 1437.91392
[19] Hand, D.J. and Till, R.J. (2001) A simple generalisation of the area under the ROC curve for multiple class classification problems. Machine Learning45(2), 171-186. · Zbl 1007.68180
[20] Hastie, T., Tibshirani, R. and Friedman, J. (2009) The Elements of Statistical Learning: Data Mining, Inference, and Prediction. New York: Springer. · Zbl 1273.62005
[21] Holland, J.H. (1975) Adaptation in Natural and Artifical Systems. Ann Arbor: Univesity of Michigan Press.
[22] Mühlenbein, H. (1997) Genetic algorithms. In Local Search in Combinatorial Optimization (eds. Aarts, E.H. and Lenstra, J.K., pp. 137-172. Princeton University Press. · Zbl 0911.68156
[23] Orphanoudakis, S.C., Chronaki, C.E., Tsiknakis, M. and Kostomanolakis, S.G. (1998) Telematics in healthcare. In Medical Image Databses (ed. Wong, S.T., chap. 10, pp. 251-281. New York: Springer.
[24] Pazzani, M., Merz, C., Murphy, P., Ali, K., Hume, T. and Brunk, C. (1994) Reducing misclassification costs. ICML 1994: Proceedings of the Eleventh International Conference on Machine Learning, pp. 217-225. San Francisco, CA: Morgan Kaufman Publishers Inc.
[25] Pednault, E.P., Rosen, B.K. and Apte, C. (2000) Handling imbalanced data sets in insurance risk modeling. Technical report, Association for the Advancement of Artificial Intelligence (AAAI).
[26] Pedregosa, F., Varoquaux, G., Gramfort, A., Michel, V., Thirion, B., Grisel, O., Blondel, M., Prettenhofer, P., Weiss, R., Dubourg, V., Vanderplas, J., Passos, A., Cournapeau, D., Brucher, M., Perrot, M. and Duchesnay, E. (2011) Scikit-learn: machine learning in Python. Journal of Machine Learning Research12, 2825-2830. · Zbl 1280.68189
[27] Pérez-Marín, A. M., Guillen, M., Alcañiz, M. and Bermúdez, L. (2019) Quantile regression with telematics information to assess the risk of driving above the posted speed limit. Risks7, 1-11.
[28] Pesantez-Narvaez, J., Guillen, M. and Alcañiz, M. (2019) Predicting motor insurance claims using telematics data - XGBoost versus logistic regression. Risks7, 1-16.
[29] Schapire, R.E. and Singer, Y. (1999) Using boosting algorithms using confidence-rated predictions. Machine Learning37, 297-336. · Zbl 0945.68194
[30] Seiffert, C., Khoshgoftaar, T.M., Van Hulse, J. and Napolitano, A. (2010) RUSBoost: A hybrid approach to alleviating class imbalance. IEEE Transactions on Systems, Man, and Cybernetics - Part A: Systems and Humans40(1), 185-197.
[31] Shon, H.S., Batbaatar, E., Kim, K.O., Cha, E.J. and Kim, K.-A. (2020) Classification of kidney cancer data using cost-sensitive hybrid deep learning approach. Symmetry12, 154.
[32] Sun, Y., Kamel, M.S., Wong, A.K. and Wang, Y. (2007) Cost-sensitive boosting for classification of imbalanced data. Pattern Recognition40(12), 3358-3378. · Zbl 1122.68505
[33] Tang, Y., Zhang, Y.-Q., Chawla, N.V. and Krasser, S. (2009) SVMs modeling for highly imbalanced classification. IEEE Transactions on Systems, Man, and Cybernetics - Part A: Systems and Humans, 39(1), 281-288.
[34] Verbelen, R., Antonio, K. and Claeskens, G. (2018) Unravelling the predictive power of telematics data in car insurance pricing. Journal of the Royal Statistical Society: Series C (Applied Statistics)67(5), 1275-1304.
[35] Wüthrich, M.V. and Buser, C. (2020) Data analytics for non-life insurance pricing. Available at SSRN: https://papers.ssrn.com/sol3/papers.cfm?abstract_id=2870308
[36] Yang, Q. and Wu, X. (2006) 10 challenging problems in data mining research. International Journal of Information Technology & Decision Making5(4), 597-604.
[37] Zhang, S. (2020) Cost-sensitive KNN classification. Neurocomputing391, 234-242.
[38] Zhu, J., Zou, H., Rossett, S. and Hastie, T. (2009) Multi-class AdaBoost. Statistics and Its Interface, 2, 349-360. · Zbl 1245.62080
This reference list is based on information provided by the publisher or from digital mathematics libraries. Its items are heuristically matched to zbMATH identifiers and may contain data conversion errors. It attempts to reflect the references listed in the original paper as accurately as possible without claiming the completeness or perfect precision of the matching.