zbMATH — the first resource for mathematics

Feature selection for better identification of subtypes of Guillain-Barré syndrome. (English) Zbl 1423.92087
Summary: Guillain-Barré syndrome (GBS) is a neurological disorder which has not been explored using clustering algorithms. Clustering algorithms perform more efficiently when they work only with relevant features. In this work, we applied correlation-based feature selection (CFS), chi-squared, information gain, symmetrical uncertainty, and consistency filter methods to select the most relevant features from a 156-feature real dataset. This dataset contains clinical, serological, and nerve conduction tests data obtained from GBS patients. The most relevant feature subsets, determined with each filter method, were used to identify four subtypes of GBS present in the dataset. We used partitions around medoids (PAM) clustering algorithm to form four clusters, corresponding to the GBS subtypes. We applied the purity of each cluster as evaluation measure. After experimentation, symmetrical uncertainty and information gain determined a feature subset of seven variables. These variables conformed as a dataset were used as input to PAM and reached a purity of 0.7984. This result leads to a first characterization of this syndrome using computational techniques.
92C50 Medical applications (general)
92C20 Neural biology
92-08 Computational methods for problems pertaining to biology
62P10 Applications of statistics to biology and medical sciences; meta analysis
Full Text: DOI
[1] Kuwabara, S., Guillain-Barré syndrome, Drugs, 64, 6, 597-610, (2004)
[2] Pascual Pascual, S. I., Protocolos Diagnóstico Terapéuticos de la AEP: Neurología Pediátrica. Síndrome de Guillain-Barré, (2008), Madrid, Spain: Asociación Española de Pediatría, Madrid, Spain
[3] Uncini, A.; Kuwabara, S., Electrodiagnostic criteria for Guillain-Barrè syndrome: a critical revision and the need for an update, Clinical Neurophysiology, 123, 8, 1487-1495, (2012)
[4] van Koningsveld, R.; Steyerberg, E. W.; Hughes, R. A.; Swan, A. V.; van Doorn, P. A.; Jacobs, B. C., A clinical prognostic scoring system for Guillain-Barré syndrome, The Lancet Neurology, 6, 7, 589-594, (2007)
[5] Walgaard, C.; Lingsma, H. F.; Ruts, L.; van Doorn, P. A.; Steyerberg, E. W.; Jacobs, B. C., Early recognition of poor prognosis in Guillain-Barré syndrome, Neurology, 76, 11, 968-975, (2011)
[6] Durand, M.-C.; Porcher, R.; Orlikowski, D.; Aboab, J.; Devaux, C.; Clair, B.; Annane, D.; Gaillard, J.-L.; Lofaso, F.; Raphael, J.-C.; Sharshar, T., Clinical and electrophysiological predictors of respiratory failure in Guillain-Barré syndrome: a prospective study, The Lancet Neurology, 5, 12, 1021-1028, (2006)
[7] Paul, B. S.; Bhatia, R.; Prasad, K.; Padma, M. V.; Tripathi, M.; Singh, M. B., Clinical predictors of mechanical ventilation in Guillain-Barr syndrome, Neurology India, 60, 2, 150-153, (2012)
[8] Walgaard, C.; Lingsma, H. F.; Ruts, L.; Drenthen, J.; Van Koningsveld, R.; Garssen, M. J. P.; Van Doorn, P. A.; Steyerberg, E. W.; Jacobs, B. C., Prediction of respiratory insufficiency in Guillain-Barré syndrome, Annals of Neurology, 67, 6, 781-787, (2010)
[9] Babkin, A. V.; Kudryavtseva, T. J.; Utkina, S. A., Identification and analysis of industrial cluster structure, World Applied Sciences Journal, 28, 10, 1408-1413, (2013)
[10] Bravo Cabrera, J. L.; Azpra Romero, E.; Zarraluqui Such, V.; Gay García, C.; Estrada Porrúa, F., Cluster analysis for validated climatology stations using precipitation in Mexico, Atmosfera, 25, 4, 339-354, (2012)
[11] Burgel, P. R.; Paillasseur, J. L.; Caillaud, D.; Tillie-Leblond, I.; Chanez, P.; Escamilla, R.; Court-Fortune, I.; Perez, T.; Carré, P.; Roche, N., Clinical COPD phenotypes: a novel approach using principal component and cluster analyses, European Respiratory Journal, 36, 3, 531-539, (2010)
[12] Angus Webb, J.; Bond, N. R.; Wealands, S. R.; Nally, R. M.; Quinn, G. P.; Vesk, P. A., Bayesian clustering with autoclass explicitly recognizes uncertainties in landscape classification, Ecography, 30, 4, 526-536, (2007)
[13] Dash, M.; Liu, H.; Terano, T.; Liu, H.; Chen, A. L. P., Feature selection for clustering, Knowledge Discovery and Data Mining. Current Issues and New Applications. Knowledge Discovery and Data Mining. Current Issues and New Applications, Lecture Notes in Computer Science, 1805, 110-121, (2000), Berlin, Germany: Springer, Berlin, Germany
[14] Hall, M. A., Correlation-based feature selection for machine learning [Ph.D. thesis], (1999), Hamilton, New Zealand: University of Waikato, Hamilton, New Zealand
[15] Zheng, Z.; Wu, X.; Srihari, R., Feature selection for text categorization on imbalanced data, ACM SIGKDD Explorations Newsletter, 6, 1, 80-89, (2004)
[16] Sebastiani, F., Machine learning in automated text categorization, ACM Computing Surveys, 34, 1, 1-47, (2002)
[17] Liu, Y.; Schumann, M., Data mining feature selection for credit scoring models, Journal of the Operational Research Society, 56, 9, 1099-1108, (2005) · Zbl 1097.91533
[18] Dash, M.; Liu, H.; Motoda, H.; Terano, T.; Liu, H.; Chen, A. L. P., Consistency based feature selection, Knowledge Discovery and Data Mining. Current Issues and New Applications, 1805, 98-109, (2000), Berlin, Germany: Springer, Berlin, Germany
[19] Sarhrouni, E.; Hammouch, A.; Aboutajdine, D., Application of symmetric uncertainty and mutual information to dimensionality reduction and classification of hyperspectral images, International Journal of Engineering and Technology, 4, 5, 268-276, (2012)
[20] Kohavi, R.; John, G. H., Wrappers for feature subset selection, Artificial Intelligence, 97, 1-2, 273-324, (1997) · Zbl 0904.68143
[21] Stracuzzi, D. J.; Utgoff, P. E., Randomized variable elimination, Journal of Machine Learning Research, 5, 1331-1364, (2004) · Zbl 1222.68310
[22] Inza, I.; Larrañaga, P.; Etxeberria, R.; Sierra, B., Feature Subset Selection by Bayesian network-based optimization, Artificial Intelligence, 123, 1-2, 157-184, (2000) · Zbl 0952.68118
[23] Brieman, L.; Friedman, J.; Olshen, R.; Stone, C., Classification and Regression Trees, (1984), Wadsworth Inc.
[24] Fu, H.; Xiao, Z.; Dellandrea, E.; Dou, W.; Chen, L.; Blanc-Talon, J.; Philips, W.; Popescu, D.; Scheunders, P., Image categorization using ESFS: a new embedded feature selection method based on SFS, Advanced Concepts for Intelligent Vision Systems. Advanced Concepts for Intelligent Vision Systems, Lecture Notes in Computer Science, 5807, 288-299, (2009)
[25] Breiman, L., Random forests, Machine Learning, 45, 1, 5-32, (2001) · Zbl 1007.68152
[26] Das, S., Filters, wrappers and a boosting-based hybrid for feature selection, Proceedings of the 8th International Conference on Machine Learning
[27] Couce, Y.; Franco, L.; Urda, D.; Subirats, J. L.; Jerez, J. M.; Cabestany, I.; Rojas, I.; Joya, G., Hybrid (Generalization-Correlation) method for feature selection in high dimensional DNA microarray prediction problems, Advances in Computational Intelligence. Advances in Computational Intelligence, Lecture Notes in Computer Science, 6692, 202-209, (2011)
[28] Chebrolu, S.; Abraham, A.; Thomas, J. P.; Pal, N.; Kasabov, N.; Mudi, R.; Pal, S.; Parui, S., Hybrid feature selection for modeling intrusion detection systems, Neural Information Processing. Neural Information Processing, Lecture Notes in Computer Science, 3316, 1020-1025, (2004)
[29] Shen, Y.; Qiu, X.; Zhang, C., Quad-PRE: a hybrid method to predict protein quaternary structure attributes, Computational and Mathematical Methods in Medicine, 2014, (2014) · Zbl 1307.92310
[30] Halkidi, M.; Batistakis, Y.; Vazirgiannis, M., On clustering validation techniques, Journal of Intelligent Information Systems, 17, 2-3, 107-145, (2001) · Zbl 0998.68154
[31] Wagstaff, K.; Cardie, C.; Rogers, S.; Schroedl, S., Constrained K-means clustering with background knowledge, Proceedings of the 18th International Conference on Machine Learning
[32] Forestier, G.; Gançarski, P.; Wemmert, C., Collaborative clustering with background knowledge, Data and Knowledge Engineering, 69, 2, 211-228, (2010)
[33] Leng, M.; Cheng, G.; Wang, J., Active semisupervised clustering algorithm with label propagation for imbalanced and multidensity datasets, Mathematical Problems in Engineering, 2013, (2013) · Zbl 1299.62051
[34] Zhu, M.; Meng, F.; Zhou, Y., Semisupervised clustering for networks based on fast affinity propagation, Mathematical Problems in Engineering, 2013, (2013) · Zbl 1296.68155
[35] Pitt, E.; Nayal, R., The use of various data mining and feature selection methods in the analysis of a population survey dataset, Proceedings of the 2nd International Workshop on Integrating Artificial Intelligence and Data Mining
[36] Liu, H.; Li, J.; Wong, L., A comparative study on feature selection and classification methods using gene expression profiles and proteomic patterns, Genome Informatics, 13, 51-60, (2002)
[37] Kaufman, L.; Rousseeuw, P.; Dodge, Y., Clustering by means of medoids, Statistical Data Analysis Based on the \(L_1\)-Norm and Related Methods, 405-416, (1987), North-Holland
[38] Han, J.; Kamber, M.; Pei, J., Data Mining: Concepts and Techniques, (2012), San Francisco, Calif, USA: Morgan Kaufmann, San Francisco, Calif, USA · Zbl 1230.68018
[39] Boriah, S.; Chandola, V.; Kumar, V., Similarity measures for categorical data: a comparative evaluation, Proceedings of the 8th SIAM International Conference on Data Mining
[40] Gower, J., A general coefficient of similarity and some of its properties, Biometrics, 27, 4, 857-871, (1971)
[41] Chávez Esponda, D.; Miranda Cabrera, I.; Varela Nualles, M.; Fernández, L., Utilización del análisis de clusters con variables mixtas en la selección de genotipos de maíz, Revista Investigación Operacional, 30, 3, 209-216, (2010)
[42] Brandes, U.; Delling, D.; Gaertler, M.; Görke, R.; Hoefer, M.; Nikoloski, Z.; Wagner, D., On modularity clustering, IEEE Transactions on Knowledge and Data Engineering, 20, 2, 172-188, (2008)
[43] Arbelaitz, O.; Gurrutxaga, I.; Muguerza, J.; Pérez, J. M.; Perona, I., An extensive comparative study of cluster validity indices, Pattern Recognition, 46, 1, 243-256, (2013)
[44] Forestier, G.; Wemmert, C.; Ganarski, P.; Bi, Y.; Williams, M. A., Background knowledge integration in clustering using purity indexes, Knowledge Science, Engineering and Management. Knowledge Science, Engineering and Management, Lecture Notes in Computer Science, 6291, 28-38, (2010)
[45] Manning, C. D.; Raghavan, P.; Schütze, H., An Introduction to Information Retrieval, (2009), Cambridge, UK: Cambridge University Press, Cambridge, UK
[46] Yu, L.; Liu, H., Feature selection for high-dimensional data: a fast correlation-based filter solution, Proceedings of the 20th International Conference on Machine Learning
[47] Zhao, Z.; Liu, H., Searching for interacting features, Proceeding of the 20th International Joint Conference on Artificial Intelligence (IJCAI ’07)
[48] Taha, A. M.; Mustapha, A.; Chen, S. D., Naive bayes-guided bat algorithm for feature selection, The Scientific World Journal, 2013, (2013)
[49] Uzer, M. S.; Yilmaz, N.; Inan, O., Feature selection method based on artificial bee colony algorithm and support vector machines for medical datasets classification, The Scientific World Journal, 2013, (2013)
[50] Dai, K.; Yu, H.; Li, Q., A semisupervised feature selection with support vector machine, Journal of Applied Mathematics, 2013, (2013) · Zbl 1397.68152
This reference list is based on information provided by the publisher or from digital mathematics libraries. Its items are heuristically matched to zbMATH identifiers and may contain data conversion errors. It attempts to reflect the references listed in the original paper as accurately as possible without claiming the completeness or perfect precision of the matching.