Exact and approximation algorithms for sorting by reversals, with application to genome rearrangement.

*(English)*Zbl 0831.92014Summary: Motivated by the problem in computational biology of reconstructing the series of chromosome inversions by which one organism evolved from another, we consider the problem of computing the shortest series of reversals that transform one permutation to another. The permutations describe the order of genes on corresponding chromosomes, and a reversal takes an arbitrary substring of elements, and reverses their order.

For this problem, we develop two algorithms: a greedy approximation algorithm, that finds a solution provably close to optimal in \(O(n^2)\) time and \(O(n)\) space for \(n\)-element permutations, and a branch-and- bound exact algorithm, that finds an optimal solution on \(O(mL(n,n))\) time and \(O(n^2)\) space, where \(m\) is the size of the branch-and-bound search tree, and \(L(n,n)\) is the time to solve a linear program of \(n\) variables and \(n\) constraints. The greedy algorithm is the first to come within a constant factor of the optimum; it guarantees a solution that uses no more than twice the minimum number of reversals. The lower and upper bounds of the branch-and-bound algorithm are a novel application of maximum-weight matchings, shortest paths, and linear programming.

In a series of experiments, we study the performance of an implementation on random permutations, and permutations generated by random reversals. For permutations differing by \(k\) random reversals, we find that the average upper bound on reversal distance estimates \(k\) to within one reversal for \(k < 2^{-1} n\) and \(n \leq 100\). For the difficult case of random permutations, we find that the average difference between the upper and lower bounds is less than three reversals for \(n \leq 50\). Due to the tightness of these bounds, we can solve, to optimality, problems on 30 elements in a few minutes of computer time. This approaches the scale of mitochondrial genomes.

For this problem, we develop two algorithms: a greedy approximation algorithm, that finds a solution provably close to optimal in \(O(n^2)\) time and \(O(n)\) space for \(n\)-element permutations, and a branch-and- bound exact algorithm, that finds an optimal solution on \(O(mL(n,n))\) time and \(O(n^2)\) space, where \(m\) is the size of the branch-and-bound search tree, and \(L(n,n)\) is the time to solve a linear program of \(n\) variables and \(n\) constraints. The greedy algorithm is the first to come within a constant factor of the optimum; it guarantees a solution that uses no more than twice the minimum number of reversals. The lower and upper bounds of the branch-and-bound algorithm are a novel application of maximum-weight matchings, shortest paths, and linear programming.

In a series of experiments, we study the performance of an implementation on random permutations, and permutations generated by random reversals. For permutations differing by \(k\) random reversals, we find that the average upper bound on reversal distance estimates \(k\) to within one reversal for \(k < 2^{-1} n\) and \(n \leq 100\). For the difficult case of random permutations, we find that the average difference between the upper and lower bounds is less than three reversals for \(n \leq 50\). Due to the tightness of these bounds, we can solve, to optimality, problems on 30 elements in a few minutes of computer time. This approaches the scale of mitochondrial genomes.

##### MSC:

92C40 | Biochemistry, molecular biology |

92-08 | Computational methods for problems pertaining to biology |

92D20 | Protein sequences, DNA sequences |

90C90 | Applications of mathematical programming |

68R10 | Graph theory (including graph drawing) in computer science |

68R05 | Combinatorics in computer science |

##### Keywords:

experimental analysis of algorithms; edit distance; sorting by reversals; genome rearrangements; chromosome inversions; shortest series of reversals; greedy approximation algorithm; branch-and-bound exact algorithm; maximum-weight matchings; shortest paths; linear programming; random permutations; random reversals; mitochondrial genomes
PDF
BibTeX
Cite

\textit{J. Kececioglu} and \textit{D. Sankoff}, Algorithmica 13, No. 1--2, 180--210 (1995; Zbl 0831.92014)

Full Text:
DOI

##### References:

[1] | Aigner, M., and D. B. West. Sorting by insertion of leading elements.Journal of Combinatorial Theory, Series A,45, 306–309, 1987. · Zbl 0633.68060 |

[2] | Amato, N., M. Blum, S. Irani, and R. Rubinfeld. Reversing trains: a turn of the century sorting problem.Journal of Algorithms,10, 413–428, 1989. · Zbl 0679.68120 |

[3] | Bafna, V., and P. A. Pevzner. Genome rearrangements and sorting by reversals.Proceedings of the 34th Symposium on Foundations of Computer Science, November 1993, pp. 148–157. · Zbl 0844.68123 |

[4] | Bibb, M. J., R. A. van Etten, C. T. Wright, M. W. Walberg, and D. A. Clayton. Sequence and gene organization of mouse mitochondrial DNA.Cell,26, 167–180, 1981. |

[5] | Dobzhansky, T.Genetics of the Evolutionary Process. Columbia University Press, New York, 1970. |

[6] | Driscoll, J. R., and M. L. Furst. Computing short generator sequences.Information and Computation,72, 117–132, 1987. · Zbl 0621.20016 |

[7] | Even, S., and O. Goldreich. The minimum-length generator sequence problem is NP-hard.Journal of Algorithms,2, 311–313, 1981. · Zbl 0467.68046 |

[8] | Furst, M., J. Hopcroft, and E. Luks. Polynomial-time algorithms for permutation groups.Proceedings of the 21st Symposium on Foundations of Computer Science, 1980, pp. 36–41. · Zbl 0462.05059 |

[9] | Garey, M. R., and D. S. Johnson.Computers and Intractability: A Guide to The Theory of NP-Completeness. Freeman, New York, 1979. · Zbl 0411.68039 |

[10] | Gates, W. H., and C. H. Papadimitriou. Bounds for sorting by prefix reversal.Discrete Mathematics,27, 47–57, 1979. · Zbl 0397.68062 |

[11] | Golan, H. Personal communication, 1991. |

[12] | Jerrum, M. R. The complexity of finding minimum-length generator sequences.Theoretical Computer Science,36, 265–289, 1985. · Zbl 0564.68031 |

[13] | Johnson, D. B. Finding all the elementary circuits of a directed graph.SIAM Journal on Computing,4(1), 77–84, 1975. · Zbl 0293.05126 |

[14] | Kececioglu, J., and D. Sankoff. Exact and approximation algorithms for the inversion distance between two chromosomes.Proceedings of the 4th Symposium on Combinatorial Pattern Matching, Lecture Notes in Computer Science, vol. 684, Springer-Verlag, Berlin, June 1993, pp. 87–105. (An earlier version appeared as ”Exact and approximation algorithms for sorting by reversals,” Technical Report 1824, Centre de recherches mathématiques, Université de Montréal, July 1992). |

[15] | Kececioglu, J., and D. Sankoff. Efficient bounds for oriented chromosome-inversion distance.Proceedings of the 5th Symposium on Combinatorial Pattern Matching, Lecture Notes in Computer Science, vol. 807, Springer-Verlag, Berlin, June 1994, pp. 307–325. |

[16] | Knuth, D. E.The Art of Computer Programming, Vol. 3. Addison-Wesley, Reading, MA, 1973. · Zbl 0191.17903 |

[17] | Mannila, H. Measures of presortedness and optimal sorting algorithms,IEEE Transactions on Computers, 34, 318–325, 1985. · Zbl 0556.68031 |

[18] | Micali, S. and V. Vazirani. Ano(V|{\(\cdot\)}|E|) algorithm for finding maximum matchings in general graphs.Proceedings of the 21st Symposium on Foundations of Computer Science, 1980, pp. 17–27. |

[19] | Nadeau, J. H., and B. A. Taylor. Lengths of chromosomal segments conserved since divergence of man and mouse.Proceedings of the National Academy of Sciences of the USA,81, 814, 1984. |

[20] | O’Brien, S. J., ed.Genetic Maps: Locus Maps of Complex Genomes. 6th edition. Cold Spring Harbor Laboratory Press, Cold Spring Harbor, NY, 1993. |

[21] | Palmer, J. D., B. Osorio, and W. F. Thompson. Evolutionary significance of inversions in legume chloroplast DNAs.Current Genetics,14, 65–74, 1988. |

[22] | Sankoff, D., G. Leduc, N. Antoine, B. Paquin, B. F. Lang, and R. Cedergren. Gene order comparisons for phylogenetic inference: evolution of the mitochondrial genome.Proceedings of the National Academy of Sciences of the USA,89, 6575–6579, 1992. |

[23] | Schöniger, M., and M. S. Waterman. A local algorithm for DNA sequence alignment with inversions.Bulletin of Mathematical Biology,54, 521–536, 1992. · Zbl 0745.92020 |

[24] | Sessions, S. K. Chromosomes: molecular cytogenetics. InMolecular Systematics, D. M. Hillis and C. Moritz, eds., Sinauer, Sunderland, MA, 1990, pp. 156–204. |

[25] | Tichy, W. F. The string-to-string correction problem with block moves.ACM Transactions on Computer Systems,2(4), 309–321, 1984. |

[26] | Wagner, R. A. On the complexity of the extended string-to-string correction problem. InTime Warps, String Edits, and Macromolecules: The Theory and Practice of Sequence Comparison, D. Sankoft and J. B. Kruskal, eds., Addison-Wesley, Reading, MA, 1983, pp. 215–235. |

[27] | Watterson, G. A., W. J. Ewens, T. E. Hall, and A. Morgan. The chromosome inversion problem.Journal of Theoretical Biology,99, 1–7, 1982. |

[28] | Wolstenholme, D. R., J. L. MacFarlane, R. Okimoto, D. O. Clary, and J. A. Wahieithner. Bizarre tRNAs inferred from DNA sequences of mitochondrial genomes of nematode worms.Proceedings of the National Academy of Sciences of the USA,84, 1324–1328, 1987. |

This reference list is based on information provided by the publisher or from digital mathematics libraries. Its items are heuristically matched to zbMATH identifiers and may contain data conversion errors. It attempts to reflect the references listed in the original paper as accurately as possible without claiming the completeness or perfect precision of the matching.