×

The flowing nature matters: feature learning from the control flow graph of source code for bug localization. (English) Zbl 1491.68052

Summary: Bug localization plays an important role in software maintenance. Traditional works treat the source code from the lexical perspective, while some recent researches indicate that exploiting the program structure is beneficial for improving bug localization. Control flow graph (CFG) is a widely used graph representation, which essentially represents the program structure. Although using graph neural network for feature learning is a straightforward way and has been proven effective in various software mining problems, this approach is inappropriate since adjacent nodes in the CFG could be totally unrelated in semantics. On the other hand, previous statements may affect the semantics of subsequent statements along the execution path, which we call the flowing nature of control flow graph. In this paper, we claim that the flowing nature should be explicitly considered and propose a novel model named cFlow for bug localization, which employs a particular designed flow-based GRU for feature learning from the CFG. The flow-based GRU exploits the program structure represented by the CFG to transmit the semantics of statements along the execution path, which reflects the flowing nature. Experimental results on widely-used real-world software projects show that cFlow significantly outperforms the state-of-the-art bug localization methods, indicating that exploiting the program structure from the CFG with respect to the flowing nature is beneficial for improving bug localization.

MSC:

68N99 Theory of software
68R10 Graph theory (including graph drawing) in computer science
68T05 Learning and adaptive systems in artificial intelligence
PDFBibTeX XMLCite
Full Text: DOI

References:

[1] Allamanis, M., Brockschmidt, M., & Khademi, M. (2018). Learning to represent programs with graphs. In International Conference on Learning Representations.
[2] Alon, U., Zilberstein, M., Levy, O., & Yahav, E. (2019). Code2vec: Learning distributed representations of code Proc. ACM Program. Lang., 3.
[3] Feng, Z., Guo, D., Tang, D.-Y., Duan, N., Feng, X.-C., Gong, M., Shou, L.-J., Qin, B., Liu, T., & Jiang, D., et al. (2020). CodeBERT: A pre-trained model for programming and natural languages. In Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing: Findings, pages 1536-1547. Association for Computational Linguistics.
[4] Fischer, M., Pinzger, M., & Gall, H. (2003). Populating a release history database from version control and bug tracking systems. In International Conference on Software Maintenance, 2003. ICSM 2003. Proceedings., pp. 23-32.
[5] Gay, G., Haiduc, S., Marcus, A., & Menzies, T. (2009). On the use of relevance feedback in ir-based concept location. In 2009 IEEE International Conference on Software Maintenance, pp. 351-360.
[6] Grover, A., & Leskovec, J. (2016). Node2vec: Scalable feature learning for networks. In Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, page 855-864, San Francisco, California, USA. Association for Computing Machinery.
[7] Huo, X., & Li, M. (2017). Enhancing the unified features to locate buggy files by exploiting the sequential nature of source code. In Proceedings of the 25th International Joint Conference on Artificial Intelligence, pp. 1909-1915, Melbourne, Australia.
[8] Huo, X., Li, M., & Zhou, Z.-H. (2016). Learning unified features from natural and programming languages for locating buggy source code. In Proceedings of the 25th International Joint Conference on Artificial Intelligence, pp. 1606-1612, New York, USA.
[9] Huo, X., Li, M., & Zhou, Z.-H. (2020). Control flow graph embedding based on multi-instance decomposition for bug localization. In Proceedings of the 34th AAAI Conference on Artificial Intelligence, volume 34, pp. 4223-4230, New York, USA.
[10] Huo, X., Thung, F., Li, M., Lo, D., & Shi, S.-T. (2019). Deep transfer bug localization. IEEE Transactions on Software Engineering.
[11] Kim, Y. (2014). Convolutional neural networks for sentence classification. In Proceedings of the Conference on Empirical Methods in Natural Lanugage Processing, pp. 1746-1751, Doha, Qatar.
[12] Kingma, D. P. & Ba, J. (2014). Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980.
[13] Kipf, T. N. & Welling, M. (2017). Semi-supervised classification with graph convolutional networks. In 5th International Conference on Learning Representations, Toulon, France, April 24-26, 2017, Conference Track Proceedings.
[14] Kochhar, P. S., Tian, Y., & Lo, D. (2014). Potential biases in bug localization: Do they matter? In Proceedings of the 29th ACM/IEEE International Conference on Automated Software Engineering, pp. 803-814, Vasteras, Sweden.
[15] Lam, A. N., Nguyen, A. T., Nguyen, H. A., & Nguyen, T. N. (2015). Combining deep learning with information retrieval to localize buggy files for bug reports (n). In 2015 30th IEEE/ACM International Conference on Automated Software Engineering, pp. 476-481.
[16] Lam, A. N., Nguyen, A. T., Nguyen, H. A., & Nguyen, T. N. (2017). Bug localization with combination of deep learning and information retrieval. In 2017 IEEE/ACM 25th International Conference on Program Comprehension, pp. 218-229.
[17] Li, Y., Wang, S.-H., & Nguyen, T. N. (2020). Dlfix: Context-based code transformation learning for automated program repair. In Proceedings of the ACM/IEEE 42nd International Conference on Software Engineering, ICSE ’20, page 602-614, New York, NY, USA. Association for Computing Machinery.
[18] Li, Y., Wang, S.-H., Nguyen, T. N., & Van Nguyen, S. (2019). Improving bug detection via context-based code representation learning and attention-based neural networks Proc. ACM Program. Lang., 3.
[19] Li, Y.-J., Tarlow, D., Brockschmidt, M., & Zemel, R. (2015). Gated graph sequence neural networks. arXiv preprint arXiv:1511.05493.
[20] Lukins, S. K., Kraft, N. A., & Etzkorn, L. H. (2008). Source code retrieval for bug localization using latent dirichlet allocation. In 2008 15th Working Conference on Reverse Engineering, pp. 155-164.
[21] Mikolov, T., Chen, K., Corrado, G., & Dean, J. (2013). Efficient estimation of word representations in vector space.
[22] Mou, L.-L., Li, G., Zhang, L., Wang, T., & Jin, Z. (2016). Convolutional neural networks over tree structures for programming language processing. Proceedings of the AAAI Conference on Artificial Intelligence,30(1).
[23] Niepert, M., Ahmed, M., & Kutzkov, K. (2016). Learning convolutional neural networks for graphs. In Proceedings of The 33rd International Conference on Machine Learning, volume 48 of Proceedings of Machine Learning Research, pp. 2014-2023. PMLR.
[24] Perozzi, B., Al-Rfou, R., & Skiena, S. (2014). Deepwalk: Online learning of social representations. In Proceedings of the 20th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 701-710, New York, NY, USA.
[25] Poshyvanyk, D.; Gueheneuc, Y-G; Marcus, A.; Antoniol, G.; Rajlich, V., Feature location using probabilistic ranking of methods based on execution scenarios and information retrieval, IEEE Transactions on Software Engineering, 33, 6, 420-432 (2007) · doi:10.1109/TSE.2007.1016
[26] Rahman, M. M. & Roy, C. K. (2018). Improving ir-based bug localization with context-aware query reformulation. In Proceedings of the 2018 26th ACM Joint Meeting on European Software Engineering Conference and Symposium on the Foundations of Software Engineering, pp. 621-632, New York, NY, USA.
[27] Shi, S-T; Li, M.; Lo, D.; Thung, F.; Huo, X., Automatic code review by learning the revision of source code, Proceedings of the AAAI Conference on Artificial Intelligence, 33, 1, 4910-4917 (2019) · doi:10.1609/aaai.v33i01.33014910
[28] Veličković, P., Cucurull, G., Casanova, A., Romero, A., Liò, P., & Bengio, Y. (2018). Graph attention networks. In International Conference on Learning Representations.
[29] Wei, H.-H., & Li, M. (2017). Supervised deep features for software functional clone detection by exploiting lexical and syntactical information in source code. In Proceedings of the 26th International Joint Conference on Artificial Intelligence, pp. 3034-3040.
[30] White, M., Vendome, C., Linares-Vasquez, M., & Poshyvanyk, D. (2015). Toward deep learning software repositories. In 2015 IEEE/ACM 12th Working Conference on Mining Software Repositories, pp. 334-345.
[31] Ye, X., Bunescu, R., & Liu, C. (2014). Learning to rank relevant files for bug reports using domain knowledge. In Proceedings of the 22nd ACM SIGSOFT International Symposium on Foundations of Software Engineering, pp. 689-699, New York, NY, USA.
[32] Youm, KC; Ahn, J.; Lee, E., Improved bug localization based on code change histories and bug reports, Information and Software Technology, 82, 177-192 (2017) · doi:10.1016/j.infsof.2016.11.002
[33] Zhang, J.-L., Xie, R., Ye, W., Zhang, Y.-H., & Zhang, S.-K. (2020). Exploiting code knowledge graph for bug localization via bi-directional attention. In Proceedings of the 28th International Conference on Program Comprehension, pp. 219-229, New York, NY, USA.
[34] Zhang, Y-Y; Li, M., Find me if you can: Deep software clone detection by exploiting the contest between the plagiarist and the detector, Proceedings of the AAAI Conference on Artificial Intelligence, 33, 1, 5813-5820 (2019) · doi:10.1609/aaai.v33i01.33015813
[35] Zhou, J., Zhang, H.-Y., & Lo, D. (2012). Where should the bugs be fixed? more accurate information retrieval-based bug localization based on bug reports. In 2012 34th International Conference on Software Engineering, pp. 14-24.
[36] Zhou, Y-Q; Liu, S-Q; Siow, J.; Du, X-N; Liu, Y., Devign: Effective vulnerability identification by learning comprehensive program semantics via graph neural networks, In Advances in Neural Information Processing Systems, 32, 10197-10207 (2019)
This reference list is based on information provided by the publisher or from digital mathematics libraries. Its items are heuristically matched to zbMATH identifiers and may contain data conversion errors. In some cases that data have been complemented/enhanced by data from zbMATH Open. This attempts to reflect the references listed in the original paper as accurately as possible without claiming completeness or a perfect matching.