A note on convex characters, Fibonacci numbers and exponential-time algorithms.

*(English)*Zbl 1431.05084Summary: Phylogenetic trees are used to model evolution: leaves are labelled to represent contemporary species (“taxa”) and interior vertices represent extinct ancestors. Informally, convex characters are measurements on the contemporary species in which the subset of species (both contemporary and extinct) that share a given state, forms a connected subtree. Given an unrooted, binary phylogenetic tree \(\mathcal{T}\) on a set of \(n\geq 2\) taxa, a closed (but fairly opaque) expression for the number of convex characters on \(\mathcal{T}\) has been known since 1992 [M. Steel, J. Classif. 9, No. 1, 91–116 (1992; Zbl 0766.92002)], and this is independent of the exact topology of \(\mathcal{T}\). In this note we prove that this number is actually equal to the \((2n-1)\)th Fibonacci number.

Next, we define \(g_k(\mathcal{T})\) to be the number of convex characters on \(\mathcal{T}\) in which each state appears on at least \(k\) taxa. We show that, somewhat curiously, \(g_2(\mathcal{T})\) is also independent of the topology of \(\mathcal{T}\), and is equal to the \((n-1)\)th Fibonacci number. As we demonstrate, this topological neutrality subsequently breaks down for \(k \geq 3\). However, we show that for each fixed \(k \geq 1\), \(g_k(\mathcal{T})\) can be computed in \(O(n)\) time and the set of characters thus counted can be efficiently listed and sampled. We use these insights to give a simple but effective exact algorithm for the NP-hard maximum parsimony distance problem that runs in time \(\operatorname{\Theta}(\phi^n \cdot n^2)\), where \(\phi \approx 1.618 \ldots\) is the golden ratio, and an exact algorithm which computes the tree bisection and reconnection distance (equivalently, a maximum agreement forest) in time \(\operatorname{\Theta}(\phi^{2 n} \cdot \operatorname{poly}(n))\), where \(\phi^2 \approx 2.619\).

Next, we define \(g_k(\mathcal{T})\) to be the number of convex characters on \(\mathcal{T}\) in which each state appears on at least \(k\) taxa. We show that, somewhat curiously, \(g_2(\mathcal{T})\) is also independent of the topology of \(\mathcal{T}\), and is equal to the \((n-1)\)th Fibonacci number. As we demonstrate, this topological neutrality subsequently breaks down for \(k \geq 3\). However, we show that for each fixed \(k \geq 1\), \(g_k(\mathcal{T})\) can be computed in \(O(n)\) time and the set of characters thus counted can be efficiently listed and sampled. We use these insights to give a simple but effective exact algorithm for the NP-hard maximum parsimony distance problem that runs in time \(\operatorname{\Theta}(\phi^n \cdot n^2)\), where \(\phi \approx 1.618 \ldots\) is the golden ratio, and an exact algorithm which computes the tree bisection and reconnection distance (equivalently, a maximum agreement forest) in time \(\operatorname{\Theta}(\phi^{2 n} \cdot \operatorname{poly}(n))\), where \(\phi^2 \approx 2.619\).

##### MSC:

05C30 | Enumeration in graph theory |

05A15 | Exact enumeration problems, generating functions |

05C85 | Graph algorithms (graph-theoretic aspects) |

05C90 | Applications of graph theory |

11B39 | Fibonacci and Lucas numbers and polynomials and generalizations |

##### References:

[1] | Allen, B.; Steel, M., Subtree transfer operations and their induced metrics on evolutionary trees, Ann. Comb., 5, 1-15, (2001) · Zbl 0978.05023 |

[2] | Bachoore, E.; Bodlaender, H., Convex recoloring of leaf-colored trees, (2006), Utrecht University Technical Report, Utrecht |

[3] | Bordewich, M.; Huber, K.; Semple, C., Identifying phylogenetic trees, Discrete Math., 300, 30-43, (2005) · Zbl 1071.92024 |

[4] | Chen, J.; Fan, J.-H.; Sze, S.-H., Parameterized and approximation algorithms for maximum agreement forest in multifurcating trees, Theoret. Comput. Sci., 562, 496-512, (2015) · Zbl 1303.68154 |

[5] | Dress, A.; Huber, K.; Koolen, J.; Moulton, V.; Spillner, A., Basic phylogenetic combinatorics, (2012), Cambridge University Press Cambridge · Zbl 1298.92008 |

[6] | Fischer, M.; Kelk, S., On the maximum parsimony distance between phylogenetic trees, Ann. Comb., 20, 87-113, (2016) · Zbl 1332.05043 |

[7] | Fitch, W., Toward defining the course of evolution: minimum change for a specific tree topology, Syst. Zool., 20, 406-416, (1971) |

[8] | Hartigan, J., Minimum mutation fits to a given tree, Biometrics, 29, 53-65, (1973) |

[9] | Kelk, S.; Fischer, M., On the complexity of computing MP distance between binary phylogenetic trees, Ann. Comb., (2016), in press, preprint |

[10] | Kelk, S.; Fischer, M.; Moulton, V.; Wu, T., Reduction rules for the maximum parsimony distance on phylogenetic trees, Theoret. Comput. Sci., 646, 1-15, (2016) · Zbl 1348.68068 |

[11] | Koshy, T., Fibonacci and Lucas numbers with applications, Pure and Applied Mathematics: A Wiley Series of Texts, Monographs and Tracts, (2011), Wiley New York |

[12] | Semple, C.; Steel, M., Phylogenetics, (2003), Oxford University Press Oxford · Zbl 1043.92026 |

[13] | Steel, M., The complexity of reconstructing trees from qualitative characters and subtrees, J. Classification, 9, 91-116, (1992) · Zbl 0766.92002 |

[14] | Steel, M.; Fu, Y., Classifying and counting linear phylogenetic invariants for the jukes-Cantor model, J. Comput. Biol., 2, 39-47, (1995) |

This reference list is based on information provided by the publisher or from digital mathematics libraries. Its items are heuristically matched to zbMATH identifiers and may contain data conversion errors. It attempts to reflect the references listed in the original paper as accurately as possible without claiming the completeness or perfect precision of the matching.