×

A survey on semi-supervised feature selection methods. (English) Zbl 1429.68239

Summary: Feature selection is a significant task in data mining and machine learning applications which eliminates irrelevant and redundant features and improves learning performance. In many real-world applications, collecting labeled data is difficult, while abundant unlabeled data are easily accessible. This motivates researchers to develop semi-supervised feature selection methods which use both labeled and unlabeled data to evaluate feature relevance. However, till-to-date, there is no comprehensive survey covering the semi-supervised feature selection methods. In this paper, semi-supervised feature selection methods are fully investigated and two taxonomies of these methods are presented based on two different perspectives which represent the hierarchical structure of semi-supervised feature selection methods. The first perspective is based on the basic taxonomy of feature selection methods and the second one is based on the taxonomy of semi-supervised learning methods. This survey can be helpful for a researcher to obtain a deep background in semi-supervised feature selection methods and choose a proper semi-supervised feature selection method based on the hierarchical structure of them.

MSC:

68T05 Learning and adaptive systems in artificial intelligence
PDF BibTeX XML Cite
Full Text: DOI

References:

[1] Kalakech, M.; Biela, P.; Macaire, L.; Hamad, D., Constraint scores for semi-supervised feature selection: a comparative study, Pattern Recognit. Lett., 32, 656-665 (2011)
[2] Zhao, M.; Zhang, Z.; Chow, T. W.S., Trace ratio criterion based generalized discriminative learning for semi-supervised dimensionality reduction, Pattern Recognit., 45, 1482-1499 (2012) · Zbl 1231.68226
[4] Shen, K.-Q.; Ong, C.-J.; Li, X.-P.; Wilder-Smith, E. P.V., Feature selection via sensitivity analysis of SVM probabilistic outputs, Mach. Learn., 70, 1-20 (2008)
[5] Benabdeslem, K.; Hindawi, M., Efficient semi-supervised feature selection: constraint, relevance, and redundancy, IEEE Trans. Knowl. Data Eng., 26, 1131-1143 (2014)
[6] Zhang, D.; Chen, S.; Zhou, Z.-H., Constraint score: a new filter method for feature selection with pairwise constraints, Pattern Recognit., 41, 1440-1451 (2008) · Zbl 1140.68490
[7] Reif, M.; Shafait, F., Efficient feature size reduction via predictive forward selection, Pattern Recognit., 47, 1664-1673 (2014)
[8] Xue, B.; Zhang, M.; Member, S.; Browne, W. N., Particle swarm optimization for feature selection in classification: a multi-objective approach, IEEE Trans. Cybern., 43, 1656-1671 (2013)
[9] Zhang, X.; Wu, G.; Dong, Z.; Crawford, C., Embedded feature-selection support vector machine for driving pattern recognition, J. Frankl. Inst., 352, 669-685 (2015) · Zbl 1307.93389
[10] Yang, J. D.; Xu, H.; Jia, P. F., Effective search for genetic-based machine learning systems via estimation of distribution algorithms and embedded feature reduction techniques, Neurocomputing, 113, 105-121 (2013)
[12] Chen, X.; Fang, T.; Huo, H.; Li, D., Semisupervised feature selection for unbalanced sample sets of VHR images, IEEE Geosci. Remote Sens. Lett., 7, 781-785 (2010)
[13] Sun, Y.; Wen, G., Emotion recognition using semi-supervised feature selection with speaker normalization, Int. J. Speech Technol., 1-15 (2015)
[14] Chen, C.-H., A semi-supervised feature selection method using a non-parametric technique with pairwise instance constraints, J. Inf. Sci., 39, 359-371 (2013)
[16] Mitra, P.; Murthy, C. A.; Pal, S. K., Unsupervised feature selection using feature similarity, IEEE Trans. Pattern Anal. Mach. Intell., 24, 301-312 (2002)
[17] Maldonado, S.; Weber, R.; Basak, J., Simultaneous feature selection and classification using kernel-penalized support vector machines, Inf. Sci., 181, 115-128 (2011)
[18] Uǧuz, H., A two-stage feature selection method for text categorization by using information gain, principal component analysis and genetic algorithm, Knowl. Based Syst., 24, 1024-1032 (2011)
[19] Hall, M.; Holmes, G., Benchmarking attribute selection techniques for discrete class data mining, IEEE Trans. Knowl. Data Eng., 15, 1437-1447 (2003)
[20] Unler, A.; Murat, A.; Chinnam, R. B., Mr2PSO: a maximum relevance minimum redundancy feature selection method based on swarm intelligence for support vector machine classification, Inf. Sci., 181, 4625-4641 (2011)
[21] Chen, C.-H., A hybrid intelligent model of analyzing clinical breast cancer data using clustering techniques with feature selection, Appl. Soft Comput., 20, 4-14 (2014)
[22] Pohjalainen, J.; Räsänen, O.; Kadioglu, S., Feature selection methods and their combinations in high-dimensional classification of speaker likability, intelligibility and personality traits, Comput. Speech Lang., 29, 145-171 (2015)
[23] Zhao, Z.; Fu, G.; Liu, S.; Elokely, K. M.; Doerksen, R. J.; Chen, Y., Drug activity prediction using multiple-instance learning via joint instance and feature selection, BMC Bioinform., 14, Suppl 1, S16 (2013)
[24] Xue, B.; Zhang, M.; Browne, W. N., Particle swarm optimisation for feature selection in classification: novel initialisation and updating mechanisms, Appl. Soft Comput., 18, 261-276 (2014)
[25] Peng, Y.; Wu, Z.; Jiang, J., A novel feature selection approach for biomedical data classification, J. Biomed. Inform., 43, 15-23 (2010)
[26] Nowotny, T.; Berna, A. Z.; Binions, R.; Trowell, S., Optimal feature selection for classifying a large set of chemicals using metal oxide sensors, Sens. Actuators B Chem., 187, 471-480 (2013)
[27] Unler, A.; Murat, A., A discrete particle swarm optimization method for feature selection in binary classification problems, Eur. J. Oper. Res., 206, 528-539 (2010) · Zbl 1188.90280
[28] Rashedi, E.; Nezamabadi-Pour, H.; Saryazdi, S., A simultaneous feature adaptation and feature selection method for content-based image retrieval systems, Knowl. Based Syst., 39, 85-94 (2013)
[29] Chen, H.-L.; Yang, B.; Liu, J.; Liu, D.-Y., A support vector machine classifier with rough set-based feature selection for breast cancer diagnosis, Expert Syst. Appl., 38, 9014-9022 (2011)
[30] Kersten, J., Simultaneous feature selection and Gaussian mixture model estimation for supervised classification problems, Pattern Recognit., 47, 2582-2595. (2014) · Zbl 1339.68223
[31] Peralta, B.; Soto, A., Embedded local feature selection within mixture of experts, Inf. Sci., 269, 176-187 (2014)
[32] Wang, S.; Li, D.; Song, X.; Wei, Y.; Li, H., A feature selection method based on improved fisher’s discriminant ratio for text sentiment classification, Expert Syst. Appl., 38, 8696-8702 (2011)
[33] Akay, M. F., Support vector machines combined with feature selection for breast cancer diagnosis, Expert Syst. Appl., 36, 3240-3247 (2009)
[34] Bamakan, S. M.H.; Gholami, P., A novel feature selection method based on an integrated data envelopment analysis and entropy model, Procedia Comput. Sci., 31, 632-638 (2014)
[35] Nakariyakul, S., Suboptimal branch and bound algorithms for feature subset selection: a comparative study, Pattern Recognit. Lett., 45, 62-70 (2014)
[36] Yang, J.; Liu, Y.; Liu, Z.; Zhu, X.; Zhang, X., A new feature selection algorithm based on binomial hypothesis testing for spam filtering, Knowl. Based Syst., 24, 904-914 (2011)
[37] Li, G.-Z.; Meng, H.-H.; Lu, W.-C.; Yang, J. Y.; Yang, M., Asymmetric bagging and feature selection for activities prediction of drug molecules, BMC Bioinform., 9 (2008)
[38] Shi, P.; Ray, S.; Zhu, Q.; Kon, M. A., Top scoring pairs for feature selection in machine learning and applications to cancer outcome prediction, BMC Bioinform., 12, 375 (2011)
[39] Zhou, W.; Dickerson, J. A., A novel class dependent feature selection method for cancer biomarker discovery, Comput. Biol. Med., 47, 66-75 (2014)
[41] Sheikhpour, R.; Sarram, M. A.; Sheikhpour, R., Particle swarm optimization for bandwidth determination and feature selection of kernel density estimation based classifiers in diagnosis of breast cancer, Appl. Soft Comput., 40, 113-131 (2016)
[42] Chin, A.; Mirzal, A.; Haron, H.; Hamed, H., Supervised, unsupervised and semi-supervised feature selection: a review on gene selection, IEEE/ACM Trans. Comput. Biol. Bioinform. (2015)
[43] Bolón-Canedo, V.; Sánchez-Maroño, N.; Alonso-Betanzos, A.; Benítez, J. M.; Herrera, F., A review of microarray datasets and applied feature selection methods, Inf. Sci., 282, 111-135 (2014)
[44] Chandrashekar, G.; Sahin, F., A survey on feature selection methods, Comput. Electr. Eng., 40, 16-28 (2014)
[45] Saeys, Y.; Inza, I.; Larrañaga, P., A review of feature selection techniques in bioinformatics, Bioinformatics, 23, 2507-2517 (2007)
[46] Guyon, I.; Elisseeff, a., An introduction to variable and feature selection, J. Mach. Learn. Res., 3, 1157-1182 (2003) · Zbl 1102.68556
[48] Song, X.; Zhang, J.; Han, Y.; Jiang, J., Semi-supervised feature selection via hierarchical regression for web image classification, Multimed. Syst. (2014)
[49] Han, Y.; Yang, Y.; Yan, Y.; Ma, Z.; Sebe, N.; Member, S., Semisupervised feature selection via spline regression for video semantic recognition, IEEE Trans. Neural Netw. Learn. Syst., 26, 252-264 (2015)
[52] Bellal, F.; Elghazel, H.; Aussem, A., A semi-supervised feature ranking method with ensemble learning, Pattern Recognit. Lett., 33, 1426-1433 (2012)
[56] Zuo, L.; Li, L.; Chen, C., The graph based semi-supervised algorithm with ℓ1-regularizer, Neurocomputing, 149, 966-974 (2015)
[57] Zhang, K.; Lan, L.; Kwok, J. T.; Vucetic, S.; Parvin, B., Scaling up graph-based semisupervised learning via prototype vector machines, IEEE Trans. Neural Netw. Learn. Syst., 26, 444-457 (2015)
[59] Chapelle, O.; Schölkopf, B.; Zien, A., Semi-Supervised Learning (2006), MIT Press: MIT Press Cambridge
[60] Chahooki, M. A.Z.; Charkari, N. M., Unsupervised manifold learning based on multiple feature spaces, Mach. Vis. Appl., 25, 1053-1065 (2014)
[62] Halder, A.; Ghosh, S.; Ghosh, A., Aggregation pheromone metaphor for semi-supervised classification, Pattern Recognit., 46, 2239-2248 (2013)
[65] Prakash, V. J.; Nithya, L. M., A survey On semi-supervised learning techniques, Int. J. Comput. Trends Technol., 8, 25-29 (2014)
[66] Zhao, J.; Lu, K.; He, X., Locality sensitive semi-supervised feature selection, Neurocomputing, 71, 1842-1849 (2008)
[68] Doquire, G.; Verleysen, M., A graph laplacian based approach to semi-supervised feature selection for regression problems, Neurocomputing, 121, 5-13 (2013)
[73] Liu, Y.; Nie, F.; Wu, J.; Chen, L., Efficient semi-supervised feature selection with noise insensitive trace ratio criterion, Neurocomputing, 105, 12-18 (2013)
[76] Ma, Z.; Nie, F.; Yang, Y.; Uijlings, J. R.R.; Sebe, N.; Member, S., Discriminating joint feature analysis for multimedia data understanding, IEEE Trans. Multimed., 14, 1662-1672 (2012)
[77] Shi, C.; Ruan, Q.; An, G., Sparse feature selection based on graph Laplacian for web image annotation, Image Vis. Comput., 32, 189-201 (2014)
[79] Xu, Z.; King, I.; Lyu, M. R.T.; Jin, R., Discriminative semi-supervised feature selection via manifold regularization, IEEE Trans. Neural Netw., 21, 1033-1047 (2010)
[80] Ang, J. C.; B, H. H.; Nuzly, H.; Hamed, A.; Haron, H.; Hamed, H. N.A., Semi-supervised SVM-based feature felection for cancer classification using microarray gene expression data, Curr. Approaches Appl. Artif. Intell., 468-477 (2015)
[81] Dai, K.; Yu, H.-Y.; Li, Q., A semisupervised feature selection with support vector machine, J. Appl. Math., 2013 (2013) · Zbl 1397.68152
[82] Bishop, C. M., Neural Networks for Pattern Recognition (1995), Clarendon Press: Clarendon Press Oxford
[85] Zeng, Z.; Wang, X.; Zhang, J.; Wu, Q., Semi-supervised feature selection based on local discriminative information, Neurocomputing (2015)
[87] Foucart, S.; Lai, M.-J., Sparsest solutions of underdetermined linear systems via ℓq-minimization for 0<q<1, Appl. Comput. Harmon. Anal., 26, 395-407 (2009) · Zbl 1171.90014
[89] Chartrand, R., Exact reconstruction of sparse signals via nonconvex minimization, IEEE Signal Process. Lett., 14, 707-710 (2007)
[91] Zongben, X.; Xiangyu, C.; Fengmin, X.; Hai, Z., l1/2 regularization: a thresholding representation theory and a fast solver, IEEE Trans. Neural Netw. Learn. Syst., 23, 1013-1027 (2012)
[95] Nie, F.; Xu, D.; Tsang, I. W.-H.; Zhang, C., Flexible manifold embedding: a framework for semi-supervised and unsupervised dimension reduction, Image Process. IEEE Trans., 19, 1921-1932 (2010) · Zbl 1371.94276
This reference list is based on information provided by the publisher or from digital mathematics libraries. Its items are heuristically matched to zbMATH identifiers and may contain data conversion errors. It attempts to reflect the references listed in the original paper as accurately as possible without claiming the completeness or perfect precision of the matching.