×

A hierarchical inferential method for indoor scene classification. (English) Zbl 1396.68123

Summary: Indoor scene classification forms a basis for scene interaction for service robots. The task is challenging because the layout and decoration of a scene vary considerably. Previous studies on knowledge-based methods commonly ignore the importance of visual attributes when constructing the knowledge base. These shortcomings restrict the performance of classification. The structure of a semantic hierarchy was proposed to describe similarities of different parts of scenes in a fine-grained way. Besides the commonly used semantic features, visual attributes are also introduced to construct the knowledge base. Inspired by the processes of human cognition and the characteristics of indoor scenes, we proposed an inferential framework based on the Markov logic network. The framework is evaluated on a popular indoor scene dataset, and the experimental results demonstrate its effectiveness.

MSC:

68T45 Machine vision and scene understanding
68T35 Theory of languages and software systems (knowledge-based systems, expert systems, etc.) for artificial intelligence

Software:

WEKA; HOGgles; iHOG
PDFBibTeX XMLCite
Full Text: DOI

References:

[1] Alleysson, D., Susstrunk, S. and Herault, J. (2005). Linear demosaicing inspired by the human visual system, IEEE Transactions on Image Processing 14(4): 439-449.
[2] Banerji, S., Sinha, A. and Liu, C. (2013). New image descriptors based on color, texture, shape, and wavelets for object and scene image classification, Neurocomputing 117(0): 173-185.
[3] Bannour, H. and Hudelot, C. (2012a). Building Semantic Hierarchies Faithful to Image Semantics, Lecture Notes in Computer Science, Vol. 7131, Springer, Berlin/Heidelberg, pp. 4-15.
[4] Bannour, H. and Hudelot, C. (2012b). Hierarchical image annotation using semantic hierarchies, Proceedings of the 21st ACM International Conference on Information and Knowledge Management, Maui, HI, USA, pp. 2431-2434.
[5] Bell, S., Lawrence Zitnick, C., Bala, K. and Girshick, R. (2016). Inside-outside net: Detecting objects in context with skip pooling and recurrent neural networks, IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Las Vegas, NV, USA, pp. 2874-2883.
[6] Bottou, L. (2013). From machine learning to machine reasoning, Machine Learning 94(2): 133-149.
[7] Carneiro, G., Chan, A.B., Moreno, P.J. and Vasconcelos, N. (2007). Supervised learning of semantic classes for image annotation and retrieval, IEEE Transactions on Pattern Analysis and Machine Intelligence 29(3): 394-410.
[8] Chaojie,W., Jun, Y. and Dapeng, T. (2013). High-level attributes modeling for indoor scenes classification, Neurocomputing 121: 337-343.
[9] Chaves, R., Ramrez, J., Grriz, J. and Illn, I. (2012). Functional brain image classification using association rules defined over discriminant regions, Pattern Recognition Letters 33(12): 1666-1672.
[10] Csurka, G., Dance, C., Fan, L., Willamowski, J. and Bray, C. (2004). Visual categorization with bags of keypoints, Workshop on Statistical Learning in Computer Vision, Prague, Czech Republic, Vol. 1, pp. 1-2.
[11] Delaigle, J., Devleeschouwer, C., Macq, B. and Langendijk, L. (2002). Human visual system features enabling watermarking, 2002 IEEE International Conference on Multimedia and Expo. ICME ’02, Los Angeles, CA, USA, Vol. 2, pp. 489-492.
[12] Deng, J., Berg, A.C. and Fei-Fei, L. (2011). Hierarchical semantic indexing for large scale image retrieval, 2011 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Denver, CO, USA, pp. 785-792.
[13] Dixit, M., Chen, S., Gao, D., Rasiwasia, N. and Vasconcelos, N. (2015). Scene classification with semantic fisher vectors, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Boston, MA, USA, pp. 2974-2983.
[14] Escobar, M.-J. and Kornprobst, P. (2012). Action recognition via bio-inspired features: The richness of center-surround interaction, Computer Vision and Image Understanding 116(5): 593-605.
[15] Farhadi, A., Endres, I., Hoiem, D. and Forsyth, D. (2009). Describing objects by their attributes, IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2009, Miami, FL, USA, pp. 1778-1785.
[16] Faria, D.R., Trindade, P., Lobo, J. and Dias, J. (2014). Knowledge-based reasoning from human grasp demonstrations for robot grasp synthesis, Robotics and Autonomous Systems 62(6): 794-817.
[17] Fei-Fei, L. and Perona, P. (2005). A Bayesian hierarchical model for learning natural scene categories, IEEE Computer Society Conference on Computer Vision and Pattern Recognition, CVPR 2005, San Diego, CA, USA, Vol. 2, pp. 524-531.
[18] Felzenszwalb, P.F. and McAllester, D. (2007). The generalized a* architecture, Journal of Artificial Intelligence Research pp. 153-190. · Zbl 1183.68228
[19] Felzenszwalb, P., Girshick, R. and McAllester, D. (2010a). Cascade object detection with deformable part models, 2010 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), San Francisco, CA, USA, pp. 2241-2248.
[20] Felzenszwalb, P., Girshick, R., McAllester, D. and Ramanan, D. (2010b). Object detection with discriminatively trained part-based models, IEEE Transactions on Pattern Analysis and Machine Intelligence 32(9): 1627-1645.
[21] Felzenszwalb, P., McAllester, D. and Ramanan, D. (2008). A discriminatively trained, multiscale, deformable part model, IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2008, Anchorage, AK, USA, pp. 1-8.
[22] Feng, Q., Yuan, C., Pan, J.S., Yang, J.F., Chou, Y.T., Zhou, Y. and Li, W. (2017). Superimposed sparse parameter classifiers for face recognition, IEEE Transactions on Cybernetics 47(2): 378-390.
[23] Feng, Q. and Zhou, Y. (2016). Kernel regularized data uncertainty for action recognition, IEEE Transactions on Circuits and Systems for Video Technology PP(99): 1-1.
[24] Feng, Q., Zhou, Y. and Lan, R. (2016). Pairwise linear regression classification for image set retrieval, IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Las Vegas, NV, USA, pp. 4865-4872.
[25] Girshick, R.B., Felzenszwalb, P.F. and McAllester, D.A. (2011). Object detection with grammar models, in J. Shawe-Taylor et al. (Eds.), Advances in Neural Information Processing Systems 24, Curran Associates, Inc., Granada, pp. 442-450.
[26] Gupta, P., Arrabolu, S.S., Brown, M. and Savarese, S. (2009). Video scene categorization by 3D hierarchical histogram matching, IEEE 12th International Conference on Computer Vision, Kyoto, Japan, pp. 1655-1662.
[27] Hall, M., Frank, E., Holmes, G., Pfahringer, B., Reutemann, P. and Witten, I.H. (2009). The Weka data mining software: An update, ACM SIGKDD Explorations Newsletter 11(1): 10-18.
[28] He, K., Zhang, X., Ren, S. and Sun, J. (2016). Deep residual learning for image recognition, IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Las Vegas, NV, USA, pp. 770-778.
[29] Hoiem, D., Efros, A.A. and Hebert, M. (2005). Automatic photo pop-up, ACM SIGGRAPH 2005, Los Angeles, CA, USA, pp. 577-584.
[30] Hosang, J., Benenson, R., Doll´ar, P. and Schiele, B. (2016). What makes for effective detection proposals?, IEEE Transactions on Pattern Analysis and Machine Intelligence 38(4): 814-830.
[31] Huang, K., Tao, D., Yuan, Y., Li, X. and Tan, T. (2011). Biologically inspired features for scene classification in video surveillance, IEEE Transactions on Systems, Man, and Cybernetics B: Cybernetics 41(1): 307-313.
[32] jia Li, L., Su, H., Fei-fei, L. and Xing, E.P. (2010). Object bank: A high-level image representation for scene classification and semantic feature sparsification, in J. Lafferty et al. (Eds.), Advances in Neural Information Processing Systems 23, Curran Associates, Inc., Cambridge, pp. 1378-1386.
[33] Kembhavi, A., Yeh, T. and Davis, L.S. (2010). Why did the person cross the road (there)? Scene understanding using probabilistic logic models and common sense reasoning, in K. Daniilidis et al. (Eds.), Computer Vision-ECCV 2010: 11th European Conference on Computer Vision, Part II, Springer, Berlin/Heidelberg, pp. 693-706.
[34] Khan, S., Bennamoun, M., Sohel, F. and Togneri, R. (2014). Geometry Driven Semantic Labeling of Indoor Scenes, Lecture Notes in Computer Science, Vol. 8689, Springer International Publishing, Berlin, pp. 679-694.
[35] Kong, T., Yao, A., Chen, Y. and Sun, F. (2016). Hypernet: Towards accurate region proposal generation and joint object detection, IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Las Vegas, NV, USA, pp. 845-853.
[36] Lazebnik, S., Schmid, C. and Ponce, J. (2006). Beyond bags of features: Spatial pyramid matching for recognizing natural scene categories, 2006 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, New York, NY, USA, Vol. 2, pp. 2169-2178.
[37] Li-Jia, L., Chong, W., Yongwhan, L., Blei, D.M. and Li, F.-F. (2010). Building and using a semantivisual image hierarchy, 2010 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), San Francisco, CA, USA, pp. 3336-3343.
[38] Li, L.-J., Su, H., Lim, Y. and Fei-Fei, L. (2014). Object bank: An object-level image representation for high-level visual recognition, International Journal of Computer Vision 107(1): 20-39.
[39] Lin, D., Lu, C., Liao, R. and Jia, J. (2014). Learning important spatial pooling regions for scene classification, 2014 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Columbus, OH, USA, pp. 3726-3733.
[40] Liu, W., Anguelov, D., Erhan, D., Szegedy, C., Reed, S., Fu, C.-Y. and Berg, A.C. (2016). SSD: Single Shot Multi-Box Detector, Springer International Publishing, Cham, pp. 21-37.
[41] Liu, Z. and von Wichert, G. (2013). Applying rule-based context knowledge to build abstract semantic maps of indoor environments, 2013 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), Tokyo, Japan, pp. 5141-5147.
[42] Lorenza Saitta, J.-D.Z. (2013). Abstraction in Artificial Intelligence and Complex Systems, Springer, New York, NY.
[43] Marszalek, M. and Schmid, C. (2007). Semantic hierarchies for visual object recognition, IEEE Conference on Computer Vision and Pattern Recognition, CVPR’07, Minneapolis, MN, USA, pp. 1-7.
[44] MIT (n.d.) Indoor scene recognition. Dataset, http://web.mit.edu/torralba/www/indoor.html. · Zbl 1408.94301
[45] Mottaghi, R., Fidler, S., Yao, J., Urtasun, R. and Parikh, D. (2013). Analyzing semantic segmentation using hybrid human-machine CRFS, 2013 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Portland, OR, USA, pp. 3143-3150.
[46] Neville, J. and Jensen, D. (2007). Relational dependency networks, Journal of Machine Learning Research 8: 653-692. · Zbl 1222.68274
[47] Nguyen, D.T., Ogunbona, P.O. and Li, W. (2013). A novel shape-based non-redundant local binary pattern descriptor for object detection, Pattern Recognition 46(5): 1485-1500.
[48] Penatti, O.A., Silva, F.B., Valle, E., Gouet-Brunet, V. and Torres, R.d.S. (2014). Visual word spatial arrangement for image retrieval and classification, Pattern Recognition 47(2): 705-720.
[49] Porway, J., Wang, Q. and Zhu, S.C. (2010). A hierarchical and contextual model for aerial image parsing, International Journal of Computer Vision 88(2): 254-283.
[50] Quattoni, A. and Torralba, A. (2009). Recognizing indoor scenes, IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2009, Miami, FL, USA, pp. 413-420.
[51] Ren, X. and Ramanan, D. (2013). Histograms of sparse codes for object detection, 2013 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Columbus, OH, USA, pp. 3246-3253.
[52] Ribeiro, M.X., Bugatti, P.H., Traina Jr, C., Marques, P.M.A., Rosa, N.A. and Traina, A.J.M. (2009). Supporting content-based image retrieval and computer-aided diagnosis systems with association rule-based techniques, Data and Knowledge Engineering 68(12): 1370-1382.
[53] Richardson, M. and Domingos, P. (2006). Markov logic networks, Machine Learning 62(1): 107-136. · Zbl 1470.68221
[54] Rigamonti, R., Brown, M.A. and Lepetit, V. (2011). Are sparse representations really relevant for image classification?, 2011 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Colorado Springs, CO, USA, pp. 1545-1552.
[55] Rigamonti, R., Sironi, A., Lepetit, V. and Fua, P. (2013). Learning separable filters, 2013 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Portland, OR, USA, pp. 2754-2761.
[56] Sadovnik, A. and Chen, T. (2011). Pictorial structures for object recognition and part labeling in drawings, 18th IEEE International Conference on Image Processing (ICIP), Brussels, Belgium, pp. 3613-3616.
[57] Sharif Razavian, A., Azizpour, H., Sullivan, J. and Carlsson, S. (2014). CNN features off-the-shelf: An astounding baseline for recognition, IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Columbus, OH, USA, pp. 806-813.
[58] Shotton, J., Blake, A. and Cipolla, R. (2005). Contour-based learning for object detection, 10th IEEE International Conference on Computer Vision, ICCV 2005, Beijing, China, Vol. 1, pp. 503-510.
[59] Siagian, C. and Itti, L. (2007). Rapid biologically-inspired scene classification using features shared with visual attention, IEEE Transactions on Pattern Analysis and Machine Intelligence 29(2): 300-312.
[60] Singla, P. and Domingos, P. (2006). Entity resolution with Markov logic, 6th International Conference on Data Mining, ICDM’06, Hong Kong, China, pp. 572-582.
[61] Tang, J., Zha, Z.-J., Tao, D. and Chua, T.-S. (2012). Semantic-gap-oriented active learning for multilabel image annotation, IEEE Transactions on Image Processing 21(4): 2354-2360. · Zbl 1373.68419
[62] Tang, T. and Qiao, H. (2014). Improving invariance in visual classification with biologically inspired mechanism, Neurocomputing 133(8): 328-341.
[63] Teo, C.L., Fermller, C. and Aloimonos, Y. (2015). A Gestaltist approach to contour-based object recognition: Combining bottom-up and top-down cues, International Journal of Robotics Research 34(4-5): 627-652.
[64] Vondrick, C., Khosla, A., Malisiewicz, T. and Torralba, A. (2013). HOGgles: Visualizing object detection features, IEEE International Conference on Computer Vision, Sydney, Australia, pp. 1-8.
[65] Welter, P., Riesmeier, J., Fischer, B., Grouls, C., Kuhl, C. and Deserno (n´e Lehmann), T.M. (2011). Bridging the integration gap between imaging and information systems: A uniform data concept for content-based image retrieval in computer-aided diagnosis, Journal of the American Medical Informatics Association 18(4): 506-510.
[66] Xie, L., Tian, Q., Wang, M. and Zhang, B. (2014a). Spatial pooling of heterogeneous features for image classification, IEEE Transactions on Image Processing 23(5): 1994-2008. · Zbl 1374.94412
[67] Xie, L., Wang, J., Guo, B., Zhang, B. and Tian, Q. (2014b). Orientational pyramid matching for recognizing indoor scenes, IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Columbus, OH, USA, pp. 3734-3741.
[68] Xu, M. and Petrou, M. (2010). Learning logic rules for scene interpretation based on Markov logic networks, ACCV 9th Asian Conference on Computer Vision, Xi’an, China, pp. 341-350.
[69] Xu, M., Petrou, M. and Lu, J. (2011). Learning logic rules for the tower of knowledge using Markov logic networks, International Journal of Pattern Recognition and Artificial Intelligence 25(06): 889-907.
[70] Ye, Z., Liu, P., Zhao,W. and Tang, X. (2015). Cognition inspired framework for indoor scene annotation, Journal of Electronic Imaging 24(5): 053013.
[71] Yu, J., Rui, Y., Tang, Y.Y. and Tao, D. (2014). High-order distance-based multiview stochastic learning in image classification, IEEE Transactions on Cybernetics 44(12): 2431-2442.
[72] Yu, J., Tao, D., Rui, Y. and Cheng, J. (2013). Pairwise constraints based multiview features fusion for scene classification, Pattern Recognition 46(2): 483-496. · Zbl 1251.68194
[73] Yu, J., Tao, D. and Wang, M. (2012a). Adaptive hypergraph learning and its application in image classification, IEEE Transactions on Image Processing 21(7): 3262-3272. · Zbl 1381.62216
[74] Yu, J., Wang, M. and Tao, D. (2012b). Semisupervised multiview distance metric learning for cartoon synthesis, IEEE Transactions on Image Processing 21(11): 4636-4648. · Zbl 1373.94472
[75] Zhang, C., Liu, J., Tian, Q., Liang, C. and Huang, Q. (2013). Beyond visual features: A weak semantic image representation using exemplar classifiers for classification, Neurocomputing 120(0): 318-324.
[76] Zhou, L., Zhou, Z. and Hu, D. (2013). Scene classification using a multi-resolution bag-of-features model, Pattern Recognition 46(1): 424-433.
[77] Zhu, Y., Fathi, A. and Fei-Fei, L. (2014). Reasoning about object affordances in a knowledge base representation, in D. Fleet et al. (Eds.), Computer Vision ECCV 2014, Lecture Notes in Computer Science, Vol. 8690, Springer International Publishing, Zurich, pp. 408-424
This reference list is based on information provided by the publisher or from digital mathematics libraries. Its items are heuristically matched to zbMATH identifiers and may contain data conversion errors. In some cases that data have been complemented/enhanced by data from zbMATH Open. This attempts to reflect the references listed in the original paper as accurately as possible without claiming completeness or a perfect matching.