×

A dynamic over-sampling procedure based on sensitivity for multi-class problems. (English) Zbl 1218.68121

Summary: Classification with imbalanced datasets supposes a new challenge for researches in the framework of machine learning. This problem appears when the number of patterns that represents one of the classes of the dataset (usually the concept of interest) is much lower than in the remaining classes. Thus, the learning model must be adapted to this situation, which is very common in real applications. In this paper, a dynamic over-sampling procedure is proposed for improving the classification of imbalanced datasets with more than two classes. This procedure is incorporated into a memetic algorithm (MA) that optimizes radial basis functions neural networks (RBFNNs). To handle class imbalance, the training data are resampled in two stages. In the first stage, an over-sampling procedure is applied to the minority class to balance in part the size of the classes. Then, the MA is run and the data are over-sampled in different generations of the evolution, generating new patterns of the minimum sensitivity class (the class with the worst accuracy for the best RBFNN of the population). The methodology proposed is tested using 13 imbalanced benchmark classification datasets from well-known machine learning problems and one complex problem of microbial growth. It is compared to other neural network methods specifically designed for handling imbalanced data. These methods include different over-sampling procedures in the preprocessing stage, a threshold-moving method where the output threshold is moved toward inexpensive classes and ensembles approaches combining the models obtained with these techniques. The results show that our proposal is able to improve the sensitivity in the generalization set and obtains both a high accuracy level and a good classification level for each class.

MSC:

68T05 Learning and adaptive systems in artificial intelligence
68T10 Pattern recognition, speech recognition
PDFBibTeX XMLCite
Full Text: DOI

References:

[1] Chawla, N. V.; Japlowicz, N.; Kotcz, A., Editorial: special issue on learning from imbalanced data sets, Aigkdd Explorations, 6, 1, 1-6 (2006)
[2] J.H. Zhao, X. Li, Z.Y. Dong, Online rare events detection, in: PAKDD’07, Springer-Verlag, Berlin, Heidelberg, 2007.; J.H. Zhao, X. Li, Z.Y. Dong, Online rare events detection, in: PAKDD’07, Springer-Verlag, Berlin, Heidelberg, 2007.
[3] He, H.; Garcia, E. A., Learning from imbalanced data, IEEE Transactions on Knowledge and Data Engineering, 21, 9, 1263-1284 (2009)
[4] Sun, Y.; Wong, A. K.C.; Kamel, M. S., Classification of imbalanced data: a review, International Journal of Pattern Recognition and Artificial Intelligence, 23, 4, 687-719 (2009)
[5] Kubat, M.; Matwin, S., Addressing the curse of imbalanced training sets: one-sided selection, (Proceedings of the 14th International Conference on Machine Learning (1997)), 179-186
[6] Pazzani, M.; Merz, C.; Murphy, P.; Ali, K.; Hume, T.; Brunk, C., Reducing misclassification costs: knowledge intensive approaches to learning from noisy data, (Proceedings of the 11th International Conference on Machine Learning (ICML-1994) (1994)), 100-109
[7] Chawla, N. V.; Bowyer, K. W.; Hall, L. O.; Kegelmeyer, W. P., Smote: synthetic minority over-sampling technique, Journal of Artificial Intelligence Research, 16, 321-357 (2002) · Zbl 0994.68128
[8] Moscato, P.; Cotta, C., A gentle introduction to memetic algorithms, (Handbook of Metaheuristics, International Series in Operations Research and Management Science, vol. 57 (2003), Springer: Springer New York), 105-144 · Zbl 1107.90459
[9] Back, T., Evolutionary Algorithms in Theory and Practice (1996), Oxford · Zbl 0877.68060
[10] García, S.; Herrera, F., Evolutionary training set selection to optimize c4.5 in imbalanced problems, (International Conference on Hybrid Intelligent Systems (2008), IEEE Computer Society), 567-572
[11] Folleco, A.; Khoshgoftaar, T. M.; Napolitano, A., Comparison of four performance metrics for evaluating sampling techniques for low quality class-imbalanced data, (Seventh International Conference on Machine Learning and Applications, ICMLA ’08 (2008)), 153-158
[12] Taft, L. M.; Evans, R. S.; Shyu, C. R.; Egger, M. J.; Chawla, N.; Mitchell, J. A.; Thornton, S. N.; Bray, B.; Varner, M., Countering imbalanced datasets to improve adverse drug event predictive models in labor and delivery, Journal of Biomedical Informatics, 42, 2, 356-364 (2009)
[13] Orriols-Puig, A.; Bernadó-Mansilla, E., Evolutionary rule-based systems for imbalanced data sets, Soft Computing, 13, 3, 213-225 (2009)
[14] J. Stefanowski, S. Wilk, Extending rule-based classifiers to improve recognition of imbalanced classes, in: Studies in Computational Intelligence, vol. 223, 2009.; J. Stefanowski, S. Wilk, Extending rule-based classifiers to improve recognition of imbalanced classes, in: Studies in Computational Intelligence, vol. 223, 2009.
[15] Zhou, Z.-H.; Liu, X.-Y., Training cost-sensitive neural networks with methods addressing the class imbalance problem, IEEE Transactions on Knowledge and Data Engineering, 18, 1, 63-77 (2006)
[16] Fernández, A.; Del Jesus, M. J.; Herrera, F., On the influence of an adaptative inference system in fuzzy rule based classification systems for imbalanced data-sets, Expert Systems with Applications, 36, 9805-9812 (2009)
[17] Xue, J.; Titterington, D. M., Do unbalanced data have a negative effect on lda?, Pattern Recognition, 41, 5, 1575-1588 (2008) · Zbl 1140.68488
[18] Ou, G. B.; Murphey, Y. L., Multi-class pattern classification using neural networks, Pattern Recognition, 40, 1, 4-18 (2007) · Zbl 1103.68777
[19] Van Hulse, J.; Khoshgoftaar, T. M.; Napolitano, A., Experimental perspectives on learning from imbalanced data, (Proceedings of the 24th International Conference on Machine Learning (ICML’07), vol. 227 (2007)), 935-942
[20] Prati, R. C.; Batista, G. E.A. P.A.; Monard, M. C., Class imbalances versus class overlapping: an analysis of a learning system behavior, (MICAI 2004: Advances in Artificial Intelligence, Lecture Notes in Computer Science, vol. 2972 (2004)), 312-321
[21] Kubat, M.; Holte, R. C.; Matwin, S., Machine learning for the detection of oil spills in satellite radar images, Machine Learning, 30, 2-3, 195-215 (1998)
[22] Fawcett, T.; Provost, F., Adaptive fraud detection, Data Mining and Knowledge Discovery, 1, 3, 291-316 (1997)
[23] Ezawa, K. J.; Singh, M.; Norton, S. W., Learning goal oriented Bayesian networks for telecommunications management, (Proceedings of the 13th International Conference on Machine Learning (1996)), 139-147
[24] Riddle, P.; Segal, R.; Etzioni, O., Representation design and brute-force induction in a boeing manufacturing domain, Applied Artificial Intelligence, 8, 1, 125-147 (1994)
[25] Fernández-Navarro, F.; Valero, A.; Hervás-Martínez, C.; Gutíerrez, P.; García-Gimeno, R.; Zurera-Cosano, G., Development of a multi-classification neural network model to determine the microbial growth/no growth interface, International Journal of Food Microbiology, 141, 203-212 (2010)
[26] Cardie, C.; Howe, N., Improving minority class prediction using case-specific feature weights, (Proceedings of the 14th International Conference on Machine Learning (1997)), 57-65
[27] Batista, G. E.A. P.A.; Prati, R. C.; Monard, M. C., A study of the behavior of several methods for balancing machine learning training data, SIGKDD Explorations, 6, 1, 20-29 (2004)
[28] Japkowicz, N.; Stephen, S., The class imbalance problem: a systematic study, Intelligent Data Analysis Journal, 6, 5, 429-449 (2009) · Zbl 1085.68628
[29] Weiss, G. M., Mining with rarity: a unifying framework, ACM SIGKDD Explorations Newsletter, 67-119 (2004)
[30] Akbani, R.; Kwek, S.; Japkowicz, N., Applying support vector machines to imbalanced datasets, (Proceedings of the 15th European Conference on Machine Learning (ECML) (2004)), 39-50 · Zbl 1132.68523
[31] Raskutti, B.; Kowalczyk, A., Extreme re-balancing for SVMs: a case study, SIGKDD Explorations, 6, 1, 60-69 (2004)
[32] Wong, A. K.C.; Wang, Y., High-order pattern discovery from discrete-valued data, IEEE Transactions on Knowledge and Data Engineering, 9, 6, 877-893 (1997)
[33] Provost, F.; Fawcett, T., Analysis and visualization of the classifier performance: comparison under imprecise class and cost distribution, (Proceedings of the Third International Conference on Knowledge Discovery (KDD97) and Data Mining (1997), AAAI Press), 43-88
[34] Huang, J.; Ling, C. X., Using auc and accuracy in evaluating learning algorithms, IEEE Transactions on Knowledge and Data Engineering, 17, 3, 299-310 (2005)
[35] Fawcett, T., An introduction to ROC analysis, Pattern Recognition Letters, 27, 861-874 (2006)
[36] Mamitsuka, H., Selecting features in microarray classification using ROC curves, Pattern Recognition, 39, 12, 2393-2404 (2006) · Zbl 1103.68774
[37] Zolghadri, M. J.; Mansoori, E. G., Weighting fuzzy classification rules using receiver operating characteristics analysis, Information Sciences, 177, 11, 2296-2307 (2007)
[38] Marrocco, C.; Duin, R. P.W.; Tortorella, F., Maximizing the area under the ROC curve by pairwise feature combination, Pattern Recognition, 41, 6, 1961-1974 (2008) · Zbl 1132.68647
[39] Hand, D. J.; Till, R. J., A simple generalisation of the area under the roc curve for multiple class classification problems, Machine Learning, 45, 171-186 (2001) · Zbl 1007.68180
[40] Ferri, C.; Hernández-Orallo, J.; Salido, M. A., Volume under the roc surface for multi-class problems, (Machine Learning: ECML 2003, Lecture Notes in Computer Science, vol. 2837 (2003)), 108-120 · Zbl 1257.68125
[41] Everson, R. M.; Fieldsend, J. E., Multi-class roc analysis from a multi-objective optimisation perspective, Pattern Recognition Letters, 27, 8, 918-927 (2006)
[42] Martínez-Estudillo, F. J.; Gutiérrez, P. A.; Hervás-Martínez, C.; Fernández, J. C., Evolutionary learning by a sensitivity-accuracy approach for multi-class problems, (Proceedings of the 2008 IEEE Congress on Evolutionary Computation (CEC’08) (2008), IEEE Press: IEEE Press Hong Kong, China), 1581-1588
[43] Fernandez, J.; Hervas, C.; Martinez, F.; Gutierrez, P., Memetic pareto evolutionary artificial neural networks to determine growth/no-growth in predictive microbiology, Applied Soft Computing, 11, 1, 534-550 (2011)
[44] Fernandez-Caballero, J.; Martinez, F.; Hervas, C.; Gutierrez, P., Sensitivity versus accuracy in multiclass problems using memetic pareto evolutionary neural networks, IEEE Transactions on Neural Networks, 21, 5, 750-770 (2010)
[45] Japkowicz, N., Supervised versus unsupervised binary-learning by feedforward neural networks, Machine Learning, 42, 1-2, 97-122 (2001) · Zbl 0970.68128
[46] Freeman, J. A.S.; Saad, D., Learning and generalization in radial basis function networks, Neural Computation, 7, 5, 1000-1020 (1995)
[47] Hwang, Y.; Bang, S., An efficient method to construct radial basis function neural network classifier, Neural Networks, 10, 8, 1495-1503 (1997)
[48] Orr, M. J.L., Regularisation in the selection of radial basis function centres, Neural Computation, 7, 3, 606-623 (1995)
[49] De Silva, C. R.; Ranganath, S.; De Silva, L. C., Cloud basis function neural network: a modified rbf network architecture for holistic facial expression recognition, Pattern Recognition, 41, 4, 1241-1253 (2008) · Zbl 1131.68080
[50] Fernández-Navarro, F.; Hervás-Martínez, C.; Cruz-Ramírez, M.; Gutiérrez, P. A.; Valero, A., Evolutionary \(q\)-Gaussian radial basis function neural network to determine the microbial growth/no growth interface of Staphylococcus aureus, Applied Soft Computing, 11, 3, 3012-3020 (2011)
[51] Fukunaga, K., Introduction to Statistical Pattern Recognition (1999), Academic Press
[52] Martínez-Estudillo, F. J.; Hervás-Martínez, C.; Gutiérrez, P. A.; Martínez-Estudillo, A. C., Evolutionary product-unit neural networks classifiers, Neurocomputing, 72, 1-2, 548-561 (2008) · Zbl 1254.68244
[53] Martínez-Estudillo, A. C.; Hervás-Martínez, C.; Martínez-Estudillo, F. J.; García-Pedrajas, N., Hybridization of evolutionary algorithms and local search by means of a clustering method, IEEE Transactions on Systems, Man and Cybernetics, Part B: Cybernetics, 36, 3, 534-545 (2006)
[54] Igel, C.; Hüsken, M., Empirical evaluation of the improved Rprop learning algorithms, Neurocomputing, 50, 6, 105-123 (2003) · Zbl 1006.68811
[55] Whitley, D. L.; Gordon, V. S.; Mathias, K. E., Lamarckian evolution, the Baldwin effect and function optimization, (Davidor, Y.; Schwefel, H. P.; Männer, R., Parallel Problem Solving from Nature—PPSN III (1994), Springer: Springer Berlin), 6-15
[56] A. Asuncion, D. Newman, UCI machine learning repository, \( \langle\) http://www.ics.uci.edu/∼mlearn/MLRepository.html \(\rangle \); A. Asuncion, D. Newman, UCI machine learning repository, \( \langle\) http://www.ics.uci.edu/∼mlearn/MLRepository.html \(\rangle \)
[57] Valero, A.; Pérez-Rodríguez, F.; Carrasco, E.; Fuentes-Alventosa, J. M.; García-Gimeno, R. M.; Zurera, G., Modelling the growth boundaries of Staphylococcus aureus: effect of temperature, ph and water activity, International Journal of Food Microbiology, 133, 1-2, 186-194 (2009)
[58] Fernández-Navarro, F.; Valero, A.; Hervás-Martínez, C.; Gutiérrez, P. A.; García-Gimeno, R. M.; Zurera-Cosano, G., Development of a multi-classification neural network model to determine the microbial growth/no growth interface, International Journal of Food Microbiology, 141, 203-212 (2010)
[59] Friedman, M., A comparison of alternative tests of significance for the problem of \(m\) rankings, Annals of Mathematical Statistics, 11, 1, 86-92 (1940) · JFM 66.1305.08
[60] Dunn, O. J., Multiple comparisons among means, Journal of the American Statistical Association, 56, 52-56 (1961) · Zbl 0103.37001
[61] Hochberg, Y.; Tamhane, A., Multiple Comparison Procedures (1987), John Wiley & Sons · Zbl 0731.62125
This reference list is based on information provided by the publisher or from digital mathematics libraries. Its items are heuristically matched to zbMATH identifiers and may contain data conversion errors. In some cases that data have been complemented/enhanced by data from zbMATH Open. This attempts to reflect the references listed in the original paper as accurately as possible without claiming completeness or a perfect matching.