zbMATH — the first resource for mathematics

A survey on tree edit distance and related problems. (English) Zbl 1078.68152
The paper deals with matching problems based on simple primitive operations applied to labeled trees. Particularly, tree edit distance problem, tree alignment distance problem and tree inclusion problem are surveyed. In order to do that, problems and algorithms are presented in a common framework.
For each problem and its variations both ordered and unordered versions are reviewed. Also central algorithms for each of the poblems are presented in more detail, including proof of correctness and time complexity analysis. The paper concludes with a summary of results in table form and a short list of possible topics for further research.

68W05 Nonnumerical algorithms
68W40 Analysis of algorithms
92D20 Protein sequences, DNA sequences
Full Text: DOI
[1] Aho, A.V.; Ullman, J.D.; Aho, D.S.; Hirschberg, J.D., Bounds on the complexity of the longest common subsequence problem, J. ACM, 1, 23, 1-12, (1976) · Zbl 0316.68027
[2] T. Akutsu, M.M. Halldórsson. On the approximation of largest common point sets and largest common subtrees, in: Proc. 5th Ann. Internat. Symp. on Algorithms and Computation, Lecture Notes in Computer Science (LNCS), Vol. 834. Springer, Berlin, 1994, pp. 405-413. · Zbl 0953.68555
[3] Alonso, L.; Schott, R.; On the tree inclusion problem, 1993., (), 211-221
[4] Arora, A.; Lund, C.; Motwani, R.; Sudan, M.; Szegedy, M., Proof verification and hardness of approximation problems, (), 14-23 · Zbl 0977.68539
[5] Chawathe, S.; Garcia-Molina, H., Meaningful change detection in structured data, (), 26-37
[6] S.S. Chawathe, A. Rajaraman, H. Garcia-Molina, J. Widom, Change detection in hierarchically structured information, in: Proc. ACM SIGMOD Internat. Conf. on Management of Data, Montréal, Québec, June 1996, pp. 493-504.
[7] Chen, W., More efficient algorithm for ordered tree inclusion, J. algorithms, 26, 370-385, (1998) · Zbl 0894.68109
[8] Chen, W., New algorithm for ordered tree-to-tree correction problem, J. algorithms, 40, 135-158, (2001) · Zbl 0980.68142
[9] Cormen, T.H.; Leiserson, C.E.; Rivest, R.L.; Stein, C., Introduction to algorithms, (2001), MIT Press Cambridge, MA · Zbl 1047.68161
[10] Dubiner, M.; Galil, Z.; Magen, E., Faster tree pattern matching, (), 145-150
[11] Farach, M.; Thorup, M., Fast comparison of evolutionary trees, (), 481-488 · Zbl 0869.92013
[12] Garey, M.J.; Johnson, D.S., Computers and intractabilitya guide to the theory of NP-completeness, (1979), Freeman New York
[13] Gupta, A.; Nishimura, N., Finding largest subtrees and smallest supertrees, Algorithmica, 21, 183-210, (1998) · Zbl 0896.68103
[14] Gusfield, D., Algorithms on strings, trees, and sequences, (1997), Cambridge University Press Cambridge · Zbl 0934.68103
[15] Harel, D.; Tarjan, R.E., Fast algorithms for finding nearest common ancestors, SIAM J. comput., 13, 2, 338-355, (1984) · Zbl 0535.68022
[16] Hoffmann, C.M.; O’Donnell, M.J., Pattern matching in trees, J. ACM, 29, 1, 68-95, (1982) · Zbl 0477.68067
[17] J. Jansson, A. Lingas, A fast algorithm for optimal alignment between similar ordered trees, in: Proc. 12th Ann. Symp. Combinatorial Pattern Matching (CPM), Lecture Notes of Computer Science (LNCS). Vol. 2089, Springer, Berlin, 2001. · Zbl 0990.68638
[18] Jiang, T.; Wang, L.; Zhang, K., Alignment of trees—an alternative to tree edit, Theoret. comput. sci., 143, (1995) · Zbl 0873.68150
[19] Keselman, D.; Amir, A., Maximum agreement subtree in a set of evolutionary trees—metrics and efficient algorithms, (), 758-769
[20] S. Khanna, R. Motwani, F.F. Yao, Approximation algorithms for the largest common subtree problem, Technical Report, Stanford University, 1995.
[21] P. Kilpeläinen, Tree matching problems with applications to structured text databases, Ph.D. Thesis, Department of Computer Science, University of Helsinki, November 1992.
[22] Kilpeläinen, P.; Mannila, H., Ordered and unordered tree inclusion, SIAM J. comput., 24, 340-356, (1995) · Zbl 0827.68050
[23] P. Klein, 2002. Personal communication.
[24] Klein, P.; Tirthapura, S.; Sharvit, D.; Kimia, B., A tree-edit-distance algorithm for comparing simple, closed shapes, (), 696-704 · Zbl 0954.68146
[25] Klein, P.N., Computing the edit-distance between unrooted ordered trees, (), 91-102 · Zbl 0932.68066
[26] D.E. Knuth, The Art of Computer Programming, Vol. 1, Addison-Wesley, Reading, MA, 1969. · Zbl 0191.18001
[27] Kosaraju, S.R., Efficient tree pattern matching, (), 178-183
[28] Landau, G.M.; Vishkin, U., Fast parallel and serial approximate string matching, J. algorithms, 10, 157-169, (1989) · Zbl 0685.68033
[29] Lu, S.Y., A tree-to-tree distance and its application to cluster analysis, IEEE trans. pattern anal. Mach. intell., 1, 219-224, (1979) · Zbl 0418.68077
[30] Lu, S.Y., A tree-matching algorithm based on node splitting and merging, IEEE trans. pattern anal. Mach. intell., 6, 2, 249-256, (1984) · Zbl 0528.68069
[31] C.L. Lu, Z.-Y. Su, C.Y., Tang, A new measure of edit distance between labeled trees, in: Proc. 7th Ann. Internat. Conf. on Computing and Combinatorics (COCOON), Lecture Notes in Computer Science (LNCS). Vol. 2108, Springer, Berlin, 2001. · Zbl 0993.92014
[32] Matoušek, J.; Thomas, R., On the complexity of finding iso- and other morphisms for partial \(k\)-trees, Discrete math., 108, 343-364, (1992) · Zbl 0764.68128
[33] R. Motwani, Lecture Notes on Approximation Algorithms, Vol. 1. Technical Report STAN-CS-92-1435, Department of Computer Science, Stanford University, 1992.
[34] Nishimura, N.; Ragde, P.; Thilikos, D.M., Finding smallest supertrees under minor containment, Internat. J. found. comput. sci., 11, 3, 445-465, (2000) · Zbl 1320.05127
[35] Ramesh, R.; Ramakrishnan, I.V., Nonlinear pattern matching in trees, J. ACM, 39, 2, 295-316, (1992) · Zbl 0799.68103
[36] T. Richter, A new algorithm for the ordered tree inclusion problem, in: Proc. 8th Ann. Symp. on Combinatorial Pattern Matching (CPM), Lecture Notes of Computer Science (LNCS), Vol. 1264, Springer, Berlin, 1997, pp. 150-166.
[37] T. Richter, A new measure of the distance between ordered trees and its applications, Technical Report 85166-cs. Department of Computer Science, University of Bonn, 1997.
[38] Selkow, S.M., The tree-to-tree editing problem, Inform. process. lett., 6, 6, 184-186, (1977) · Zbl 0391.68022
[39] Setubal, J.; Meidanis, J., Introduction to computational biology, (1997), PWS Publishing Company MA
[40] Shasha, D.; Wang, J.T.-L.; Shan, H.; Zhang, K., Atreegrepapproximate searching in unordered trees, (), 89-98
[41] Shasha, D.; Zhang, K., Fast algorithms for the unit cost editing distance between trees, J. algorithms, 11, 581-621, (1990) · Zbl 0718.68060
[42] Shasha, D.; Zhang, K., Approximate tree pattern matching, (), 341-371
[43] Tai, K.-C., The tree-to-tree correction problem, J. ACM, 26, 422-433, (1979) · Zbl 0409.68040
[44] Tanaka, E., A note on a tree-to-tree editing problem, Internat. J. pattern recogn. artif. intell., 9, 1, 167-172, (1995)
[45] Tanaka, E.; Tanaka, K., The tree-to-tree editing problem, Internat. J. pattern recogn. artif. intell., 2, 2, 221-240, (1988)
[46] Tirthapura, S.; Sharvit, D.; Klein, P.; Kimia, B.B., Indexing based on edit-distance matching of shape graphs, (), 91-102
[47] Ukkonen, E., Finding approximate patterns in strings, J. algorithms, 6, 132-137, (1985) · Zbl 0566.68072
[48] Wagner, R.A.; Fischer, M.J., The string-to-string correction problem, J. ACM, 21, 168-173, (1974) · Zbl 0278.68032
[49] Wang, J.T.-L.; Zhang, K.; Jeong, K.; Shasha, D., A system for approximate tree matching, IEEE trans. knowl. data eng., 6, 4, 559-571, (1994)
[50] K. Zhang, The Editing Distance Between Trees: Algorithms and Applications, Ph.D. Thesis, Department of Computer Science, Courant Institute, 1989.
[51] Zhang, K., Algorithms for the constrained editing problem between ordered labeled trees and related problems, Pattern recognition, 28, 463-474, (1995)
[52] Zhang, K., A constrained edit distance between unordered labeled trees, Algorithmica, 15, 3, 205-222, (1996) · Zbl 0839.68035
[53] K. Zhang, Efficient parallel algorithms for tree editing problems, in: Proc. 7th Ann. Symp. Combinatorial Pattern Matching (CPM), Lecture Notes in Computer Science, Vol. 1075, 1996, pp. 361-372.
[54] Zhang, K.; Jiang, T., Some MAX SNP-hard results concerning unordered labeled trees, Inform. process. lett., 49, 249-254, (1994) · Zbl 0795.68073
[55] Zhang, K.; Shasha, D., Simple fast algorithms for the editing distance between trees and related problems, SIAM J. comput., 18, 1245-1262, (1989) · Zbl 0692.68047
[56] Zhang, K.; Shasha, D.; Wang, J.T.L., Approximate tree matching in the presence of variable length don’t cares, J. algorithms, 16, 1, 33-66, (1994) · Zbl 0803.68038
[57] K. Zhang, R. Statman, D. Shasha, On the editing distance between unordered labeled trees, Technical Report 289, Department of Computer Science, The University of Western Ontario, 1991. · Zbl 0780.68070
[58] Zhang, K.; Statman, R.; Shasha, D., On the editing distance between unordered labeled trees, Inform. process. lett., 42, 133-139, (1992) · Zbl 0780.68070
This reference list is based on information provided by the publisher or from digital mathematics libraries. Its items are heuristically matched to zbMATH identifiers and may contain data conversion errors. It attempts to reflect the references listed in the original paper as accurately as possible without claiming the completeness or perfect precision of the matching.