×

Joint hypergraph learning and sparse regression for feature selection. (English) Zbl 1429.68247

Summary: In this paper, we propose a unified framework for improved structure estimation and feature selection. Most existing graph-based feature selection methods utilize a static representation of the structure of the available data based on the Laplacian matrix of a simple graph. Here on the other hand, we perform data structure learning and feature selection simultaneously. To improve the estimation of the manifold representing the structure of the selected features, we use a higher order description of the neighborhood structures present in the available data using hypergraph learning. This allows those features which participate in the most significant higher order relations to be selected, and the remainder discarded, through a sparsification process. We formulate a single objective function to capture and regularize the hypergraph weight estimation and feature selection processes. Finally, we present an optimization algorithm to recover the hypergraph weights and a sparse set of feature selection indicators. This process offers a number of advantages. First, by adjusting the hypergraph weights, we preserve high-order neighborhood relations reflected in the original data, which cannot be modeled by a simple graph. Moreover, our objective function captures the global discriminative structure of the features in the data. Comprehensive experiments on 9 benchmark datasets show that our method achieves statistically significant improvement over state-of-art feature selection methods, supporting the effectiveness of the proposed method.

MSC:

68T05 Learning and adaptive systems in artificial intelligence
68T10 Pattern recognition, speech recognition

Software:

LIBSVM
PDFBibTeX XMLCite
Full Text: DOI Link

References:

[1] Duda, R. O.; Hart, P. E.; Stork, D. G., Pattern Classification (2012), John Wiley & Sons
[2] Zhao, Z.; Wang, L.; Liu, H.; Ye, J., On similarity preserving feature selection, IEEE Trans. Knowl. Data Eng., 25, 3, 619-632 (2013)
[3] Peng, H.; Long, F.; Ding, C., Feature selection based on mutual information criteria of max-dependency, max-relevance, and min-redundancy, IEEE Trans. Pattern Anal. Mach. Intell., 27, 8, 1226-1238 (2005)
[4] Sun, Y.; Todorovic, S.; Goodison, S., Local-learning-based feature selection for high-dimensional data analysis, IEEE Trans. Pattern Anal. Mach. Intell., 32, 9, 1610-1626 (2010)
[5] F. Nie, H. Huang, X. Cai, C.H. Ding, Efficient and robust feature selection via joint \(\ell_{2 , 1}\); F. Nie, H. Huang, X. Cai, C.H. Ding, Efficient and robust feature selection via joint \(\ell_{2 , 1}\)
[6] F. Nie, S. Xiang, Y. Jia, C. Zhang, S. Yan, Trace ratio criterion for feature selection, in: AAAI, vol. 2, 2008, pp. 671-676.; F. Nie, S. Xiang, Y. Jia, C. Zhang, S. Yan, Trace ratio criterion for feature selection, in: AAAI, vol. 2, 2008, pp. 671-676.
[7] X. He, D. Cai, P. Niyogi, Laplacian score for feature selection, in: Advances in Neural Information Processing Systems, 2005, pp. 507-514.; X. He, D. Cai, P. Niyogi, Laplacian score for feature selection, in: Advances in Neural Information Processing Systems, 2005, pp. 507-514.
[8] Z. Zhao, H. Liu, Spectral feature selection for supervised and unsupervised learning, in: Proceedings of the 24th International Conference on Machine Learning, ACM, 2007, pp. 1151-1157.; Z. Zhao, H. Liu, Spectral feature selection for supervised and unsupervised learning, in: Proceedings of the 24th International Conference on Machine Learning, ACM, 2007, pp. 1151-1157.
[9] D. Cai, C. Zhang, X. He, Unsupervised feature selection for multi-cluster data, in: Proceedings of the 16th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, ACM, 2010, pp. 333-342.; D. Cai, C. Zhang, X. He, Unsupervised feature selection for multi-cluster data, in: Proceedings of the 16th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, ACM, 2010, pp. 333-342.
[10] Hou, C.; Nie, F.; Li, X.; Yi, D.; Wu, Y., Joint embedding learning and sparse regressiona framework for unsupervised feature selection, IEEE Trans. Cybern., 44, 6, 793-804 (2014)
[11] Xu, Z.; King, I.; Lyu, M.-T.; Jin, R., Discriminative semi-supervised feature selection via manifold regularization, IEEE Trans. Neural Netw., 21, 7, 1033-1047 (2010)
[12] Zhao, J.; Lu, K.; He, X., Locality sensitive semi-supervised feature selection, Neurocomputing, 71, 10, 1842-1849 (2008)
[13] Z. Zhao, H. Liu, Semi-supervised feature selection via spectral analysis, in: SDM, SIAM, 2007, pp. 641-646.; Z. Zhao, H. Liu, Semi-supervised feature selection via spectral analysis, in: SDM, SIAM, 2007, pp. 641-646.
[14] Liu, Y.; Nie, F.; Wu, J.; Chen, L., Efficient semi-supervised feature selection with noise insensitive trace ratio criterion, Neurocomputing, 105, 12-18 (2013)
[15] X. Niyogi, Locality preserving projections, in: Neural Information Processing Systems, vol. 16, MIT, 2004, p. 153.; X. Niyogi, Locality preserving projections, in: Neural Information Processing Systems, vol. 16, MIT, 2004, p. 153.
[16] Adini, Y.; Moses, Y.; Ullman, S., Face recognitionthe problem of compensating for changes in illumination direction, IEEE Trans. Pattern Anal. Mach. Intell., 19, 7, 721-732 (1997)
[17] D.W. Jacobs, P.N. Belhumeur, R. Basri, Comparing images under variable illumination, in: Proceedings IEEE Computer Society Conference on Computer Vision and Pattern Recognition, IEEE, 1998, pp. 610-617.; D.W. Jacobs, P.N. Belhumeur, R. Basri, Comparing images under variable illumination, in: Proceedings IEEE Computer Society Conference on Computer Vision and Pattern Recognition, IEEE, 1998, pp. 610-617.
[18] Belhumeur, P. N.; Kriegman, D. J., What is the set of images of an object under all possible illumination conditions?, Int. J. Comput. Vis., 28, 3, 245-260 (1998)
[19] Hagen, L.; Kahng, A. B., New spectral methods for ratio cut partitioning and clustering, IEEE Trans. Comput.-Aided Des. Integr. Circuits Syst., 11, 9, 1074-1085 (1992)
[20] S. Agarwal, K. Branson, S. Belongie, Higher order learning with graphs, in: Proceedings of the 23rd international conference on Machine learning, ACM, 2006, pp. 17-24.; S. Agarwal, K. Branson, S. Belongie, Higher order learning with graphs, in: Proceedings of the 23rd international conference on Machine learning, ACM, 2006, pp. 17-24.
[21] D. Zhou, J. Huang, B. Schölkopf, Learning with hypergraphs: clustering, classification, and embedding, in: Advances in Neural Information Processing Systems, 2006, pp. 1601-1608.; D. Zhou, J. Huang, B. Schölkopf, Learning with hypergraphs: clustering, classification, and embedding, in: Advances in Neural Information Processing Systems, 2006, pp. 1601-1608.
[22] Gibson, D.; Kleinberg, J.; Raghavan, P., Clustering categorical dataan approach based on dynamical systems, VLDB J., 8, 3-4, 222-236 (2000)
[23] S. Agarwal, J. Lim, L. Zelnik-Manor, P. Perona, D. Kriegman, S. Belongie, Beyond pairwise clustering, in: IEEE Computer Society Conference on Computer Vision and Pattern Recognition, vol. 2, IEEE, 2005, pp. 838-845.; S. Agarwal, J. Lim, L. Zelnik-Manor, P. Perona, D. Kriegman, S. Belongie, Beyond pairwise clustering, in: IEEE Computer Society Conference on Computer Vision and Pattern Recognition, vol. 2, IEEE, 2005, pp. 838-845.
[24] L. Sun, S. Ji, J. Ye, Hypergraph spectral learning for multi-label classification, in: Proceedings of the 14th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, ACM, 2008, pp. 668-676.; L. Sun, S. Ji, J. Ye, Hypergraph spectral learning for multi-label classification, in: Proceedings of the 14th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, ACM, 2008, pp. 668-676.
[25] Chung, F. R., Spectral Graph Theory (1997), American Mathematical Society · Zbl 0867.05046
[26] Huang, Y.; Liu, Q.; Lv, F.; Gong, Y.; Metaxas, D. N., Unsupervised image categorization by hypergraph partition, IEEE Trans. Pattern Anal. Mach. Intell., 33, 6, 1266-1273 (2011)
[27] Belkin, M.; Niyogi, P., Laplacian eigenmaps for dimensionality reduction and data representation, Neural Comput., 15, 6, 1373-1396 (2003) · Zbl 1085.68119
[28] C. Hou, F. Nie, D. Yi, Y. Wu, Feature selection via joint embedding learning and sparse regression, in: Proceedings of the International Joint Conference on Artificial Intelligence, vol. 22, Citeseer, 2011, pp. 1324-1330.; C. Hou, F. Nie, D. Yi, Y. Wu, Feature selection via joint embedding learning and sparse regression, in: Proceedings of the International Joint Conference on Artificial Intelligence, vol. 22, Citeseer, 2011, pp. 1324-1330.
[29] Zien, J. Y.; Schlag, M. D.; Chan, P. K., Multilevel spectral hypergraph partitioning with arbitrary vertex sizes, IEEE Trans. Comput.-Aided Des. Integr. Circuits Syst., 18, 9, 1389-1399 (1999)
[30] Wright, J.; Yang, A. Y.; Ganesh, A.; Sastry, S. S.; Ma, Y., Robust face recognition via sparse representation, IEEE Trans. Pattern Anal. Mach. Intell., 31, 2, 210-227 (2009)
[31] Q. Gu, Z. Li, J. Han, Joint feature selection and subspace learning, in: Proceedings-International Joint Conference on Artificial Intelligence, vol. 22, 2011, pp. 1294-1299.; Q. Gu, Z. Li, J. Han, Joint feature selection and subspace learning, in: Proceedings-International Joint Conference on Artificial Intelligence, vol. 22, 2011, pp. 1294-1299.
[32] Roweis, S. T.; Saul, L. K., Nonlinear dimensionality reduction by locally linear embedding, Science, 290, 5500, 2323-2326 (2000)
[33] Belkin, M.; Niyogi, P.; Sindhwani, V., Manifold regularizationa geometric framework for learning from labeled and unlabeled examples, J. Mach. Learn. Res., 7, 2399-2434 (2006) · Zbl 1222.68144
[34] Y. Huang, Q. Liu, D. Metaxas, Video object segmentation by hypergraph cut, in: IEEE Conference on Computer Vision and Pattern Recognition, IEEE, 2009, pp. 1738-1745.; Y. Huang, Q. Liu, D. Metaxas, Video object segmentation by hypergraph cut, in: IEEE Conference on Computer Vision and Pattern Recognition, IEEE, 2009, pp. 1738-1745.
[35] Nutt, C. L.; Mani, D.; Betensky, R. A.; Tamayo, P.; Cairncross, J. G.; Ladd, C.; Pohl, U.; Hartmann, C.; McLaughlin, M. E.; Batchelor, T. T., Gene expression-based classification of malignant gliomas correlates better with survival than histological classification, Cancer Res., 63, 7, 1602-1607 (2003)
[36] Spira, A.; Beane, J. E.; Shah, V.; Steiling, K.; Liu, G.; Schembri, F.; Gilman, S.; Dumas, Y.-M.; Calner, P.; Sebastiani, P., Airway epithelial gene expression in the diagnostic evaluation of smokers with suspect lung cancer, Nat. Med., 13, 3, 361-366 (2007)
[37] S.A. Nene, S.K. Nayar, H. Murase, et al., Columbia Object Image Library (Coil-20), Technical Report CUCS-005-96, 1996.; S.A. Nene, S.K. Nayar, H. Murase, et al., Columbia Object Image Library (Coil-20), Technical Report CUCS-005-96, 1996.
[38] LeCun, Y.; Bottou, L.; Bengio, Y.; Haffner, P., Gradient-based learning applied to document recognition, Proc. IEEE, 86, 11, 2278-2324 (1998)
[39] G. Griffin, A. Holub, P. Perona, Caltech-256 Object Category Dataset.; G. Griffin, A. Holub, P. Perona, Caltech-256 Object Category Dataset.
[40] S. Lazebnik, C. Schmid, J. Ponce, Beyond bags of features: Spatial pyramid matching for recognizing natural scene categories, in: IEEE Computer Society Conference on Computer Vision and Pattern Recognition, vol. 2, IEEE, 2006, pp. 2169-2178.; S. Lazebnik, C. Schmid, J. Ponce, Beyond bags of features: Spatial pyramid matching for recognizing natural scene categories, in: IEEE Computer Society Conference on Computer Vision and Pattern Recognition, vol. 2, IEEE, 2006, pp. 2169-2178.
[41] Chang, C.-C.; Lin, C-J, Libsvm: a library for support vector machines, ACM Trans. Intell. Syst. Technol., 23, 3 (2011)
[42] X. Zhu, Z. Ghahramani, J. Lafferty, et al., Semi-supervised learning using Gaussian fields and harmonic functions, in: ICML, 2003, pp. 912-919.; X. Zhu, Z. Ghahramani, J. Lafferty, et al., Semi-supervised learning using Gaussian fields and harmonic functions, in: ICML, 2003, pp. 912-919.
[43] Zhu, X.; Huang, Z.; Cui, J.; Shen, H. T., Video-to-shot tag propagation by graph sparse group lasso, IEEE Trans. Multimed., 15, 3, 633-646 (2013)
[44] Zhu, X.; Huang, Z.; Yang, Y.; Shen, H. T.; Xu, C.; Luo, J., Self-taught dimensionality reduction on the high-dimensional small-sized data, Pattern Recognit., 46, 1, 215-229 (2013) · Zbl 1248.68429
[45] X. Zhu, X. Wu, W. Ding, S. Zhang, Feature selection by joint graph sparse coding, in: SDM, SIAM, 2013, pp. 803-811.; X. Zhu, X. Wu, W. Ding, S. Zhang, Feature selection by joint graph sparse coding, in: SDM, SIAM, 2013, pp. 803-811.
This reference list is based on information provided by the publisher or from digital mathematics libraries. Its items are heuristically matched to zbMATH identifiers and may contain data conversion errors. In some cases that data have been complemented/enhanced by data from zbMATH Open. This attempts to reflect the references listed in the original paper as accurately as possible without claiming completeness or a perfect matching.