×

A joint loss function for deep face recognition. (English) Zbl 1415.68182

Summary: Convolutional neural networks (CNNs) have been widely used in computer vision community, and significantly improving the state-of-the-art. How to train an intra-class variant and inter-class discriminative feature is a central topic in face recognition. This paper proposes to learn an effective feature from face images by a joint loss function which combines the hard sample triplet (HST) and the absolute constraint triplet (ACT) loss, under the criteria that a maximum intra-class distance should be smaller than any inter-class distance. With the joint supervision of HST and ACT loss, CNNs is enable to learn discriminative features to improve face recognition performance. Experiments on labeled faces in the wild, IARPA Janus Benchmark (IJB-A) and YouTube Faces datasets achieve a comparable or superior performance to the state-of-the-arts.

MSC:

68T10 Pattern recognition, speech recognition
68T05 Learning and adaptive systems in artificial intelligence
PDFBibTeX XMLCite
Full Text: DOI

References:

[1] Cai, X., Wang, C., Xiao, B., et al. (2012). Deep nonlinear metric learning with independent subspace analysis for face verification. In Proceedings of the 20th ACM international conference on multimedia. ACM.
[2] Cao, X., Wipf, D., Wen, F., et al. (2013). A practical transfer learning algorithm for face verification. In Proceedings of the IEEE international conference on computer vision.
[3] Cao, Z., Yin, Q., Tang, X., et al. (2010). Face recognition with learning-based descriptor. In Proceedings of the IEEE conference on computer vision and pattern recognition (CVPR), 2010. IEEE.
[4] Chen, D., Cao, X., Wen, F., et al. (2013). Blessing of dimensionality: High-dimensional feature and its efficient compression for face verification. In Proceedings of the IEEE conference on computer vision and pattern recognition.
[5] Chen, J.-C., Patel, V. M., & Chellappa, R. (2016). Unconstrained face verification using deep cnn features. In Proceedings of the IEEE winter conference on applications of computer vision (WACV), 2016. IEEE.
[6] Cheng, D., Gong, Y., Zhou, S., et al. (2016). Person re-identification by multi-channel parts-based cnn with improved triplet loss function. In Proceedings of the IEEE conference on computer vision and pattern recognition.
[7] Deng, J., Dong, W., Socher, R., et al. (2009). Imagenet: A Large-scale hierarchical image database. In Proceedings of the IEEE conference on computer vision and pattern recognition, 2009, CVPR 2009. IEEE.
[8] Ding, C., & Tao, D. (2015). Robust face recognition via multimodal deep face representation. IEEE Transactions on Multimedia, 17(11), 2049-58. · doi:10.1109/TMM.2015.2477042
[9] Ding, C., & Tao, D. (2018). Trunk-branch ensemble convolutional neural networks for video-based face recognition. IEEE Transactions on Pattern Analysis and Machine Intelligence, 40(4), 1002-1014. · doi:10.1109/TPAMI.2017.2700390
[10] Guillaumin, M., Verbeek, J., & Schmid, C. (2009). Is that you? Metric learning approaches for face identification. In Proceedings of the IEEE 12th international conference on computer vision, 2009. IEEE.
[11] Hadsell, R., Chopra, S., & Lecun, Y. (2006). Dimensionality reduction by learning an invariant mapping. In Proceedings of the IEEE computer society conference on computer vision and pattern recognition, 2006. IEEE.
[12] He, K., Zhang, X., Ren, S., et al. (2016). Deep residual learning for image recognition. In Proceedings of the IEEE conference on computer vision and pattern recognition.
[13] Hermans, A., Beyer, L., & Leibe, B. (2017). In defense of the triplet loss for person re-identification. arXiv preprint arXiv:1703.07737.
[14] Hu, J., Lu, J., & Tan, Y.-P. (2014a). Discriminative deep metric learning for face verification in the wild. In Proceedings of the IEEE conference on computer vision and pattern recognition.
[15] Hu, J., Lu, J., Tan, Y.-P., Yuan, J., & Zhou, J. (2018). Local large-margin multi-metric learning for face and kinship verification. IEEE Transactions on Circuits and Systems for Video Technology, 28(8), 1875-1891. · doi:10.1109/TCSVT.2017.2691801
[16] Hu, J., Lu, J., Yuan, J., et al. (2014b). Large margin multi-metric learning for face and kinship verification in the wild. In Proceedings of the Asian conference on computer vision. Springer.
[17] Huang, C., Zhu, S., & Yu, K. (2012a). Large scale strongly supervised ensemble metric learning, with applications to face verification and retrieval. arXiv preprint arXiv:1212.6094.
[18] Huang, G. B., Lee, H., & Learned-Miller, E. (2012b). Learning hierarchical representations for face verification with convolutional deep belief networks. In Proceedings of the IEEE conference on computer vision and pattern recognition (CVPR), 2012. IEEE.
[19] Huang, G. B., Ramesh, M., Berg, T., et al. (2007). Technical report 07-49. Amherst: University of Massachusetts.
[20] Jia, Y., Shelhamer, E., Donahue, J., et al. (2014). Caffe: Convolutional architecture for fast feature embedding. In Proceedings of the 22nd ACM international conference on Multimedia. ACM.
[21] Klare, B. F., Klein, B., Taborsky, E., et al. (2015). Pushing the frontiers of unconstrained face detection and recognition: IARPA Janus Benchmark A. In Proceedings of the IEEE conference on computer vision and pattern recognition.
[22] Krizhevsky, A., Sutskever, I., & Hinton, G. E. (2012). Imagenet classification with deep convolutional neural networks. In Proceedings of the advances in neural information processing systems.
[23] Li, H., Hua, G., Shen, X., et al. (2014). Eigen-pep for video face recognition. In Proceedings of the Asian conference on computer vision. Springer.
[24] Liu, W., Wen, Y., Yu, Z., et al. (2016). Large-margin softmax loss for convolutional neural networks. In Proceedings of the ICML.
[25] Liu, W., Wen, Y., Yu, Z., et al. (2017a). SphereFace: Deep hypersphere embedding for face recognition. arXiv preprint arXiv:1704.08063.
[26] Liu, W., Wen, Y., Yu, Z., et al. (2017b). Sphereface: Deep hypersphere embedding for face recognition. In Proceedings of the IEEE conference on computer vision and pattern recognition (CVPR).
[27] Lu, C., & Tang, X. (2015). Surpassing human-level face verification performance on LFW with Gaussian face. In Proceedings of the AAAI.
[28] Lu, J., Wang, G., Deng, W., et al. (2015). Multi-manifold deep metric learning for image set classification. In Proceedings of the IEEE conference on computer vision and pattern recognition, 2015.
[29] Masi, I., Rawls, S., Medioni, G., et al. (2016). Pose-aware face recognition in the wild. In Proceedings of the IEEE conference on computer vision and pattern recognition.
[30] Parkhi, O. M., Vedaldi, A., & Zisserman, A. (2015a). Deep face recognition. In Proceedings of the BMVC.
[31] Parkhi, O. M., Vedaldi, A., & Zisserman, A. (2015b). Deep face recognition. In British machine vision conference (BMVC).
[32] Ranjan, R., Castillo, C. D., & Chellappa, R. (2017). L2-constrained softmax loss for discriminative face verification. arXiv preprint arXiv:1703.09507.
[33] Russakovsky, O., Deng, J., Su, H., et al. (2015). Imagenet large scale visual recognition challenge. International Journal of Computer Vision, 115(3), 211-52. · doi:10.1007/s11263-015-0816-y
[34] Schroff, F., Kalenichenko, D., & Philbin, J. (2015). Facenet: A unified embedding for face recognition and clustering. In Proceedings of the IEEE conference on computer vision and pattern recognition.
[35] Simonyan, K., & Zisserman, A. (2014). Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556.
[36] Sun, Y., Chen, Y., Wang, X., et al. (2014b). Deep learning face representation by joint identification – verification. In Proceedings of the advances in neural information processing systems.
[37] Sun, Y., Wang, X., & Tang, X. (2013). Hybrid deep learning for face verification. In Proceedings of the IEEE international conference on computer vision.
[38] Sun, Y., Wang, X., & Tang, X. (2014a). Deep learning face representation from predicting 10,000 classes. In Proceedings of the IEEE conference on computer vision and pattern recognition.
[39] Sun, Y., Wang, X., & Tang, X. (2015). Deeply learned face representations are sparse, selective, and robust. In IEEE conference on computer vision and pattern recognition (CVPR) (pp. 2892-2900).
[40] Szegedy, C., Liu, W., Jia, Y., et al. (2015). Going deeper with convolutions. In Proceedings of the IEEE conference on computer vision and pattern recognition.
[41] Taigman, Y., Yang, M., Ranzato, M. A., et al. (2014). Deepface: Closing the gap to human-level performance in face verification. In Proceedings of the IEEE conference on computer vision and pattern recognition.
[42] Tran, L., Yin, X., & Liu, X. (2017). Disentangled representation learning gan for pose-invariant face recognition. In Proceedings of the CVPR.
[43] Wang, D., Otto, C., & Jain, A. K. (2017). Face search at scale. IEEE Transactions on Pattern Analysis and Machine Intelligence, 39(6), 1122-36. · doi:10.1109/TPAMI.2016.2582166
[44] Wang, H., Wang, Y., Zhou, Z., et al. (2018). CosFace: Large margin cosine loss for deep face recognition. arXiv preprint arXiv:1801.09414.
[45] Wen, G., Mao, Y., Cai, D., & He, X. (2018). Split-Net: Improving face recognition in one forwarding operation. Neurocomputing. https://doi.org/10.1016/j.neucom.2018.06.030.
[46] Wen, Y., Zhang, K., Li, Z., et al. (2016a). A discriminative feature learning approach for deep face recognition. In Proceedings of the ECCV (7).
[47] Wen, Y., Zhang, K., Li, Z., et al. (2016b). A discriminative feature learning approach for deep face recognition. In Proceedings of the European conference on computer vision. Springer.
[48] Wolf, L., Hassner, T., & Maoz, I. (2011). Face recognition in unconstrained videos with matched background similarity. In Proceedings of the IEEE Conference on computer vision and pattern recognition (CVPR), 2011. IEEE.
[49] Wu, W., Kan, M., Liu, X., et al. (2017). Recursive spatial transformer (rest) for alignment-free face recognition. In Proceedings of the IEEE conference on computer vision and pattern recognition.
[50] Yi, D., Lei, Z., Liao, S., et al. (2014). Learning face representation from scratch. arXiv preprint arXiv:1411.7923.
[51] Yin, Q., Tang, X., & Sun, J. (2011). An associate-predict model for face recognition. In Proceedings of the IEEE conference on computer vision and pattern recognition (CVPR), 2011. IEEE.
[52] Yin, X., Yu, X., Sohn, K., et al. (2017). Towards large-pose face frontalization in the wild. In Proceedings of the ICCV.
[53] Zhang, X., Fang, Z., Wen, Y., et al. (2017). Range loss for deep face recognition with long-tailed training data. In Proceedings of the IEEE conference on computer vision and pattern recognition.
[54] Zhang, K., Zhang, Z., Li, Z., et al. (2016). Joint face detection and alignment using multitask cascaded convolutional networks. IEEE Signal Processing Letters, 23(10), 1499-503. · doi:10.1109/LSP.2016.2603342
This reference list is based on information provided by the publisher or from digital mathematics libraries. Its items are heuristically matched to zbMATH identifiers and may contain data conversion errors. In some cases that data have been complemented/enhanced by data from zbMATH Open. This attempts to reflect the references listed in the original paper as accurately as possible without claiming completeness or a perfect matching.