×

Benchmark for filter methods for feature selection in high-dimensional classification data. (English) Zbl 1510.62019

Summary: Feature selection is one of the most fundamental problems in machine learning and has drawn increasing attention due to high-dimensional data sets emerging from different fields like bioinformatics. For feature selection, filter methods play an important role, since they can be combined with any machine learning model and can heavily reduce run time of machine learning algorithms. The aim of the analyses is to review how different filter methods work, to compare their performance with respect to both run time and predictive accuracy, and to provide guidance for applications. Based on 16 high-dimensional classification data sets, 22 filter methods are analyzed with respect to run time and accuracy when combined with a classification method. It is concluded that there is no group of filter methods that always outperforms all other methods, but recommendations on filter methods that perform well on many of the data sets are made. Also, groups of filters that are similar with respect to the order in which they rank the features are found. For the analyses, the R machine learning package mlr is used. It provides a uniform programming API and therefore is a convenient tool to conduct feature selection using filter methods.

MSC:

62-08 Computational methods for problems pertaining to statistics
62H30 Classification and discrimination; cluster analysis (statistical aspects)
68T05 Learning and adaptive systems in artificial intelligence
PDFBibTeX XMLCite
Full Text: DOI

References:

[1] Aphinyanaphongs, Y.; Fu, L. D.; Li, Z.; Peskin, E. R.; Efstathiadis, E.; Aliferis, C. F.; Statnikov, A., A comprehensive empirical comparison of modern supervised classification and feature selection methods for text categorization, J. Assoc. Inf. Sci. Technol., 65, 10, 1964-1987 (2014)
[2] Biau, G.; Cadre, B.; Rouvìère, L., Accelerated gradient boosting, Mach. Learn., 108, 6, 971-992 (2019) · Zbl 1493.68293
[3] Bischl, B.; Lang, M.; Kotthoff, L.; Schiffner, J.; Richter, J.; Studerus, E.; Casalicchio, G.; Jones, Z. M., mlr: Machine learning in R, J. Mach. Learn. Res., 17, 170, 1-5 (2016) · Zbl 1392.68007
[4] Bischl, B.; Mersmann, O.; Trautmann, H.; Weihs, C., Resampling methods for meta-model validation with recommendations for evolutionary computation, Evol. Comput., 20, 2, 249-275 (2012)
[5] Bolón-Canedo, V.; Sánchez-Maroño, N.; Alonso-Betanzos, A., A review of feature selection methods on synthetic data, Knowl. Inf. Syst., 34, 3, 483-519 (2013)
[6] Bolón-Canedo, V.; Sánchez-Marono, N.; Alonso-Betanzos, A.; Benítez, J. M.; Herrera, F., A review of microarray datasets and applied feature selection methods, Inform. Sci., 282, 111-135 (2014)
[7] Bommert, A.; Rahnenführer, J.; Lang, M., A multicriteria approach to find predictive and sparse models with stable feature selection for high-dimensional data, Comput. Math. Methods Med., 2017 (2017) · Zbl 1397.92016
[8] Breiman, L., Random forests, Mach. Learn., 45, 1, 5-32 (2001) · Zbl 1007.68152
[9] Breiman, L.; Friedman, J.; Stone, C. J.; Olshen, R., Classification and Regression Trees (1984), CRC Press: CRC Press Boca Raton, FL, USA · Zbl 0541.62042
[10] Brezočnik, L.; Fister, I.; Podgorelec, V., Swarm intelligence algorithms for feature selection: A review, Appl. Sci., 8, 9 (2018)
[11] Brown, G.; Pocock, A.; Zhao, M.-J.; Luján, M., Conditional likelihood maximisation: A unifying framework for information theoretic feature selection, J. Mach. Learn. Res., 13, 27-66 (2012) · Zbl 1283.68283
[12] Cai, J.; Luo, J.; Wang, S.; Yang, S., Feature selection in machine learning: A new perspective, Neurocomputing, 300, 70-79 (2018)
[13] Casalicchio, G.; Bossek, J.; Lang, M.; Kirchhoff, D.; Kerschke, P.; Hofner, B.; Seibold, H.; Vanschoren, J.; Bischl, B., OpenML: An R package to connect to the machine learning platform OpenML, Comput. Stat., 1-15 (2017)
[14] Chandrashekar, G.; Sahin, F., A survey on feature selection methods, Comput. Electr. Eng., 40, 1, 16-28 (2014)
[15] Darshan, S. S.; Jaidhar, C., Performance evaluation of filter-based feature selection techniques in classifying portable executable files, Procedia Comput. Sci., 125, 346-356 (2018)
[16] Dash, M.; Liu, H., Feature selection for classification, Intell. Data Anal., 1, 131-156 (1997)
[17] Fayyad, U.; Irani, K., Multi-Interval Discretization of Continuous-Valued Attributes for Classification LearningTechnical report (1993), California Institute of Technology
[18] Fernández-Delgado, M.; Cernadas, E.; Barro, S.; Amorim, D., Do we need hundreds of classifiers to solve real world classification problems?, J. Mach. Learn. Res., 15, 3133-3181 (2014) · Zbl 1319.62005
[19] Fleuret, F., Fast binary feature selection with conditional mutual information, J. Mach. Learn. Res., 5, 1531-1555 (2004) · Zbl 1222.68200
[20] Forman, G., An extensive empirical study of feature selection metrics for text classification, J. Mach. Learn. Res., 3, 1289-1305 (2003) · Zbl 1102.68553
[21] Ghosh, M.; Adhikary, S.; Ghosh, K. K.; Sardar, A.; Begum, S.; Sarkar, R., Genetic algorithm based cancerous gene identification from microarray data using ensemble of filter methods, Med. Biol. Eng. Comput., 57, 1, 159-176 (2019)
[22] Guyon, I.; Elisseeff, A., An introduction to variable and feature selection, J. Mach. Learn. Res., 3, 1157-1182 (2003) · Zbl 1102.68556
[23] Hall, M. A., Correlation-Based Feature Selection for Machine Learning (1999), University of Waikato: University of Waikato Hamilton, New Zealand, (Ph.D. thesis)
[24] Hanley, J. A.; McNeil, B. J., The meaning and use of the area under a receiver operating characteristic (ROC) curve, Radiology, 143, 1, 29-36 (1982)
[25] Hira, Z. M.; Gillies, D. F., A review of feature selection and feature extraction methods applied on microarray data, Adv. Bioinform., 2015 (2015)
[26] Hoque, N.; Singh, M.; Bhattacharyya, D. K., EFS-MI: An ensemble feature selection method for classification, Complex Intell. Syst., 4, 2, 105-118 (2018)
[27] Huang, X.; Zhang, L.; Wang, B.; Li, F.; Zhang, Z., Feature clustering based support vector machine recursive feature elimination for gene selection, Appl. Intell., 48, 3, 594-607 (2018)
[28] Inza, I.; Larrañaga, P.; Blanco, R.; Cerrolaza, A. J., Filter versus wrapper gene selection approaches in DNA microarray domains, Artif. Intell. Med., 31, 2, 91-103 (2004)
[29] Izenman, A. J., Modern Multivariate Statistical Techniques: Regression, Classification, and Manifold Learning (2013), Springer: Springer New York, USA
[30] Jović, A., Brkić, K., Bogunović, N., 2015. A review of feature selection methods with applications. In: 38th International Convention on Information and Communication Technology, Electronics and Microelectronics, pp. 1200-1205.; Jović, A., Brkić, K., Bogunović, N., 2015. A review of feature selection methods with applications. In: 38th International Convention on Information and Communication Technology, Electronics and Microelectronics, pp. 1200-1205.
[31] Kalousis, A.; Prados, J.; Hilario, M., Stability of feature selection algorithms: A study on high-dimensional spaces, Knowl. Inf. Syst., 12, 1, 95-116 (2007)
[32] Karatzoglou, A.; Smola, A.; Hornik, K.; Zeileis, A., kernlab – an S4 package for kernel methods in R, J. Stat. Softw., 11, 9, 1-20 (2004)
[33] Ke, W.; Wu, C.; Wu, Y.; Xiong, N. N., A new filter feature selection based on criteria fusion for gene microarray data, IEEE Access, 6, 61065-61076 (2018)
[34] Kerschke, P.; Trautmann, H., Automated algorithm selection on continuous black-box problems by combining exploratory landscape analysis and machine learning, Evol. Comput., 27, 1, 99-127 (2019)
[35] Kittler, J., Feature set search algorithms, (Pattern Recognition and Signal Processing (1978), Sijthoff and Noordhoff: Sijthoff and Noordhoff Alphen aan den Rijn, Netherlands), 41-60
[36] Kohavi, R.; John, G. H., Wrappers for feature subset selection, Artificial Intelligence, 97, 1-2, 273-324 (1997) · Zbl 0904.68143
[37] Kruskal, W. H.; Wallis, W. A., Use of ranks in one-criterion variance analysis, J. Amer. Statist. Assoc., 47, 260, 583-621 (1952) · Zbl 0048.11703
[38] Kursa, M. B., praznik: Collection of information-based feature selection filters (2018)
[39] Lang, M.; Bischl, B.; Surmann, D., batchtools: Tools for R to work on batch systems, J. Open Source Softw., 2, 10 (2017)
[40] Larose, D. T.; Larose, C. D., Discovering Knowledge in Data (2014), John Wiley & Sons, Inc.: John Wiley & Sons, Inc. Hoboken, NJ, USA
[41] Lazar, C.; Taminau, J.; Meganck, S.; Steenhoff, D.; Coletta, A.; Molter, C.; de Schaetzen, V.; Duque, R.; Bersini, H.; Nowe, A., A survey on filter techniques for feature selection in gene eexpression microarray analysis, IEEE/ACM Trans. Comput. Biol. Bioinform., 9, 4, 1106-1119 (2012)
[42] Li, J.; Cheng, K.; Wang, S.; Morstatter, F.; Trevino, R. P.; Tang, J.; Liu, H., Feature selection: A data perspective, ACM Comput. Surv., 50, 6 (2018)
[43] Liu, Y., A comparative study on feature selection methods for drug discovery, J. Chem. Inf. Comput. Sci., 44, 5, 1823-1828 (2004)
[44] Liu, H.; Li, J.; Wong, L., A comparative study on feature selection and classification methods using gene expression profiles and proteomic patterns, Genome Inform., 13, 51-60 (2002)
[45] Liu, H.; Yu, L., Toward integrating feature selection algorithms for classification and clustering, IEEE Trans. Knowl. Data Eng., 17, 4, 491-502 (2005)
[46] Meyer, P. E.; Schretter, C.; Bontempi, G., Information-theoretic feature selection in microarray data using variable complementarity, IEEE J. Sel. Top. Sign. Proces., 2, 3, 261-274 (2008)
[47] Mohtashami, M.; Eftekhari, M., A hybrid filter-based feature selection method via hesitant fuzzy and rough sets concepts, Iran. J. Fuzzy Syst., 16, 2, 165-182 (2019) · Zbl 1429.68234
[48] Nogueira, S., Brown, G., 2016. Measuring the stability of feature selection. In: Joint European Conference on Machine Learning and Knowledge Discovery in Databases. pp. 442-457.; Nogueira, S., Brown, G., 2016. Measuring the stability of feature selection. In: Joint European Conference on Machine Learning and Knowledge Discovery in Databases. pp. 442-457.
[49] Peng, H.; Long, F.; Ding, C., Feature selection based on mutual information criteria of max-dependency, max-relevance, and min-redundancy, IEEE Trans. Pattern Anal. Mach. Intell., 27, 8, 1226-1238 (2005)
[50] R Core Team, H., R: A Language and Environment for Statistical Computing (2017), R Foundation for Statistical Computing: R Foundation for Statistical Computing Vienna, Austria
[51] Ramey, J. A., datamicroarray: Collection of data sets for classification (2016)
[52] Rasch, D.; Kubinger, K. D.; Yanagida, T., Statistics in Psychology using R and SPSS (2011), John Wiley & Sons, Inc.: John Wiley & Sons, Inc. Hoboken, NJ, USA · Zbl 1284.62019
[53] Ritchie, M. E.; Phipson, B.; Wu, D.; Hu, Y.; Law, C. W.; Shi, W.; Smyth, G. K., limma powers differential expression analyses for RNA-sequencing and microarray studies, Nucleic Acids Res., 43, 7 (2015), e47
[54] Romanski, P.; Kotthoff, L., Fselector: Selecting attributes (2016)
[55] Saeys, Y.; Inza, I.; Larrañaga, P., A review of feature selection techniques in bioinformatics, Bioinformatics, 23, 19, 2507-2517 (2007)
[56] Sammut, C.; Webb, G. I., Encyclopedia of Machine Learning (2011), Springer: Springer New York, USA · Zbl 1211.68001
[57] Sánchez-Maroño, N., Alonso-Betanzos, A., Tombilla-Sanromán, M., 2007. Filter methods for feature selection - A comparative study. In: International Conference on Intelligent Data Engineering and Automated Learning. pp. 178-187.; Sánchez-Maroño, N., Alonso-Betanzos, A., Tombilla-Sanromán, M., 2007. Filter methods for feature selection - A comparative study. In: International Conference on Intelligent Data Engineering and Automated Learning. pp. 178-187.
[58] Schliep, K.; Hechenbichler, K., kknn: Weighted k-nearest neighbors (2016)
[59] Simon, N.; Friedman, J.; Hastie, T.; Tibshirani, R., Regularization paths for cox’s proportional hazards model via coordinate descent, J. Stat. Softw., 39, 5, 1-13 (2011)
[60] Smyth, G. K., Linear models and empirical Bayes methods for assessing differential expression in microarray experiments, Stat. Appl. Genet. Mol. Biol., 3, 1 (2004) · Zbl 1038.62110
[61] Strobl, C.; Boulesteix, A.-L.; Kneib, T.; Augustin, T.; Zeileis, A., Conditional variable importance for random forests, BMC Bioinformatics, 9, 307 (2008)
[62] Tang, J.; Alelyani, S.; Liu, H., Feature selection for classification: A review, (Data Classification: Algorithms and Applications (2014), CRC Press: CRC Press Boca Raton, FL, USA), 37-64 · Zbl 1377.68210
[63] Therneau, T.; Atkinson, B.; Ripley, B., rpart: Recursive partitioning and regression trees (2017)
[64] Tibshirani, R., Regression shrinkage and selection via the lasso, J. R. Stat. Soc. Ser. B Stat. Methodol., 58, 1, 267-288 (1996) · Zbl 0850.62538
[65] Tibshirani, R.; Chu, G.; Narasimhan, B.; Li, J., samr: SAM: Significance analysis of microarrays (2011)
[66] Tusher, V. G.; Tibshirani, R.; Chu, G., Significance analysis of microarrays applied to the ionizing radiation response, Proc. Natl. Acad. Sci. USA, 98, 9, 5116-5121 (2001) · Zbl 1012.92014
[67] Vanschoren, J.; Van Rijn, J. N.; Bischl, B.; Torgo, L., OpenML: Networked science in machine learning, ACM SIGKDD Explor. Newsl., 15, 2, 49-60 (2013)
[68] Venkatesh, B.; Anuradha, J., A review of feature selection and its methods, Cybern. Inf. Technol., 19, 1, 3-26 (2019)
[69] Wah, Y. B.; Ibrahim, N.; Hamid, H. A.; Abdul-Rahman, S.; Fong, S., Feature selection methods: case of filter and wrapper approaches for maximising classification accuracy, Pertanika J. Sci. Technol., 26, 1, 329-340 (2018)
[70] Wright, M. N.; Ziegler, A., ranger: A fast implementation of random forests for high dimensional data in C++ and R, J. Stat. Softw., 77, 1, 1-17 (2017)
[71] Xue, B.; Zhang, M.; Browne, W. N., A comprehensive comparison on evolutionary feature selection approaches to classification, Int. J. Comput. Intell. Appl., 14, 2 (2015)
[72] Xue, B.; Zhang, M.; Browne, W. N.; Yao, X., A survey on evolutionary computation approaches to feature selection, IEEE Trans. Evol. Comput., 20, 4, 606-626 (2016)
[73] Yang, J.; Honavar, V., Feature subset selection using a genetic algorithm, (Feature Extraction, Construction and Selection: A Data Mining Perspective (1998), Springer: Springer New York, USA), 117-136
[74] Yu, L.; Liu, H., Efficient feature selection via analysis of relevance and redundancy, J. Mach. Learn. Res., 5, 1205-1224 (2004) · Zbl 1222.68340
[75] Zawadzki, Z.; Kosinski, M., FSelectorRcpp: ’Rcpp’ implementation of ’FSelector’ entropy-based feature selection algorithms with a sparse matrix support (2017)
[76] Zhu, Z.; Ong, Y.-S.; Dash, M., Wrapper-filter feature selection algorithm using a memetic framework, IEEE Trans. Syst. Man Cybern. B, 37, 1, 70-76 (2007)
This reference list is based on information provided by the publisher or from digital mathematics libraries. Its items are heuristically matched to zbMATH identifiers and may contain data conversion errors. In some cases that data have been complemented/enhanced by data from zbMATH Open. This attempts to reflect the references listed in the original paper as accurately as possible without claiming completeness or a perfect matching.