zbMATH — the first resource for mathematics

Computational protein design as an optimization problem. (English) Zbl 1407.92099
Summary: Proteins are chains of simple molecules called amino acids. The three-dimensional shape of a protein and its amino acid composition define its biological function. Over millions of years, living organisms have evolved a large catalog of proteins. By exploring the space of possible amino acid sequences, protein engineering aims at similarly designing tailored proteins with specific desirable properties. In Computational Protein Design (CPD), the challenge of identifying a protein that performs a given task is defined as the combinatorial optimization of a complex energy function over amino acid sequences.
In this paper, we introduce the CPD problem and some of the main approaches that have been used by structural biologists to solve it, with an emphasis on the exact method embodied in the dead-end elimination/A* algorithm (DEE/A*). The CPD problem is a specific form of binary Cost Function Network (CFN, aka Weighted CSP). We show how DEE algorithms can be incorporated and suitably modified to be maintained during search, at reasonable computational cost.
We then evaluate the efficiency of CFN algorithms as implemented in our solver toulbar2, on a set of real CPD instances built in collaboration with structural biologists. The CPD problem can be easily reduced to 0/1 Linear Programming, 0/1 Quadratic Programming, 0/1 Quadratic Optimization, Weighted Partial MaxSAT and Graphical Model optimization problems. We compare toulbar2 with these different approaches using a variety of solvers. We observe tremendous differences in the difficulty that each approach has on these instances.
Overall, the CFN approach shows the best efficiency on these problems, improving by several orders of magnitude against the exact DEE/A* approach. The introduction of dead-end elimination before or during search allows to further improve these results.

92D20 Protein sequences, DNA sequences
68T20 Problem solving in the context of artificial intelligence (heuristics, search strategies, etc.)
90C10 Integer programming
90C20 Quadratic programming
Full Text: DOI
[1] Achterberg, T.; Koch, T.; Martin, A., Branching rules revisited, Oper. Res. Lett., 33, 42-54, (2005) · Zbl 1076.90037
[2] Allouche, D.; Traoré, S.; André, I.; de Givry, S.; Katsirelos, G.; Barbe, S.; Schiex, T., Computational protein design as a cost function network optimization problem, (Principles and Practice of Constraint Programming, (2012), Springer), 840-849
[3] Anfinsen, C., Principles that govern the folding of protein chains, Science, 181, 223-253, (1973)
[4] Ansótegui, C.; Bonet, M. L.; Levy, J., Solving (weighted) partial maxsat through satisfiability testing, (Theory and Applications of Satisfiability Testing-SAT 2009, (2009), Springer), 427-440 · Zbl 1247.68242
[5] Argelich, J.; Cabiscol, A.; Lynce, I.; Manyà, F., Encoding MAX-CSP into partial MAX-SAT, (38th International Symposium on Multiple Valued Logic, 2008, ISMVL 2008, (2008), IEEE), 106-111
[6] Bacchus, F., GAC via unit propagation, (Principles and Practice of Constraint Programming-CP 2007, (2007), Springer), 133-147 · Zbl 1145.68502
[7] Boas, F. E.; Harbury, P. B., Potential energy functions for protein design, Curr. Opin. Struct. Biol., 17, 199-204, (2007)
[8] Boussemart, F.; Hemery, F.; Lecoutre, C.; Sais, L., Boosting systematic search by weighting constraints, (ECAI, (2004)), 146
[9] Bowie, J. U.; Luthy, R.; Eisenberg, D., A method to identify protein sequences that fold into a known three-dimensional structure, Science, 253, 164-170, (1991)
[10] Campeotto, F.; Dal Palù, A.; Dovier, A.; Fioretto, F.; Pontelli, E., A constraint solver for flexible protein models, Science, 253, 164-170, (1991)
[11] Carothers, J. M.; Goler, J. A.; Keasling, J. D., Chemical synthesis using synthetic biology, Curr. Opin. Biotechnol., 20, 498-503, (2009)
[12] Case, D.; Darden, T.; Cheatham, T.; Simmerling, C.; Wang, J.; Duke, R.; Luo, R.; Merz, K.; Pearlman, D.; Crowley, M.; Walker, R.; Zhang, W.; Wang, B.; Hayik, S.; Roitberg, A.; Seabra, G.; Wong, K.; Paesani, F.; Wu, X.; Brozell, S.; Tsui, V.; Gohlke, H.; Yang, L.; Tan, C.; Mongan, J.; Hornak, V.; Cui, G.; Beroza, P.; Mathews, D.; Schafmeister, C.; Ross, W.; Kollman, P., Amber 9, (2006), University of California San Francisco, Technical report
[13] Champion, E.; André, I.; Moulis, C.; Boutet, J.; Descroix, K.; Morel, S.; Monsan, P.; Mulard, L. A.; Remaud-Siméon, M., Design of α-transglucosidases of controlled specificity for programmed chemoenzymatic synthesis of antigenic oligosaccharides, J. Am. Chem. Soc., 131, 7379-7389, (2009)
[14] Cooper, M.; de Givry, S.; Sanchez, M.; Schiex, T.; Zytnicki, M.; Werner, T., Soft arc consistency revisited, Artif. Intell., 174, 449-478, (2010) · Zbl 1213.68580
[15] Cooper, M. C., High-order consistency in valued constraint satisfaction, Constraints, 10, 283-305, (2005) · Zbl 1112.68118
[16] Cooper, M. C.; de Givry, S.; Sánchez, M.; Schiex, T.; Zytnicki, M., Virtual arc consistency for weighted CSP, (Proc. of AAAI’08, (2008)), 253-258
[17] Cooper, M. C.; de Givry, S.; Schiex, T., Optimal soft arc consistency, (Proc. of IJCAI’2007, Hyderabad, India, (2007)), 68-73
[18] Cooper, M. C.; Schiex, T., Arc consistency for soft constraints, Artif. Intell., 154, 199-227, (2004) · Zbl 1085.68672
[19] Dahiyat, B. I.; Mayo, S. L., Protein design automation, Protein Sci., 5, 895-903, (1996)
[20] Davies, J.; Bacchus, F., Solving MAXSAT by solving a sequence of simpler SAT instances, (Principles and Practice of Constraint Programming-CP 2011, (2011), Springer), 225-239
[21] Davies, J.; Bacchus, F., Exploiting the power of MIP solvers in maxsat, (Theory and Applications of Satisfiability Testing-SAT 2013, (2013), Springer), 166-181 · Zbl 1390.68592
[22] Dechter, R.; Mateescu, R., AND/OR search spaces for graphical models, Artif. Intell., 171, 73-106, (2007) · Zbl 1168.68549
[23] Dechter, R.; Rish, I., Mini-buckets: a general scheme for bounded inference, J. ACM, 50, 107-153, (2003) · Zbl 1326.68335
[24] Desmet, J.; De Maeyer, M.; Hazes, B.; Lasters, I., The dead-end elimination theorem and its use in protein side-chain positioning, Nature, 356, 539-542, (1992)
[25] Desmet, J.; Spriet, J.; Lasters, I., Fast and accurate side-chain topology and energy refinement (FASTER) as a new method for protein structure optimization, Proteins, 48, 31-43, (2002)
[26] Fersht, A., Structure and mechanism in protein science: A guide to enzyme catalysis and protein folding, (1999), W.H. Freeman and Co. New York
[27] Freuder, E. C., Eliminating interchangeable values in constraint satisfaction problems, (Proc. of AAAI’91, Anaheim, CA, (1991)), 227-233
[28] Fritz, B. R.; Timmerman, L. E.; Daringer, N. M.; Leonard, J. N.; Jewett, M. C., Biology by design: from top to bottom and back, BioMed Res. Int., 2010, (2010)
[29] Gainza, P.; Roberts, K. E.; Donald, B. R., Protein design using continuous rotamers, PLoS Comput. Biol., 8, e1002335, (2012)
[30] Gainza, P.; Roberts, K. E.; Georgiev, I.; Lilien, R. H.; Keedy, D. A.; Chen, C. Y.; Reza, F.; Anderson, A. C.; Richardson, D. C.; Richardson, J. S., Osprey: protein design with ensembles, flexibility, and provable algorithms, Methods Enzymol., 523, 87-107, (2013)
[31] Georgiev, I.; Keedy, D.; Richardson, J. S.; Richardson, D. C.; Donald, B. R., Algorithm for backrub motions in protein design, Bioinformatics, 24, i196-i204, (2008)
[32] Georgiev, I.; Lilien, R. H.; Donald, B. R., Improved pruning algorithms and divide-and-conquer strategies for dead-end elimination, with application to protein design, Bioinformatics (Oxford), 22, e174-e183, (2006)
[33] Georgiev, I.; Lilien, R. H.; Donald, B. R., The minimized dead-end elimination criterion and its application to protein redesign in a hybrid scoring and search algorithm for computing partition functions over molecular ensembles, J. Comput. Chem., 29, 1527-1542, (2008)
[34] de Givry, S.; Prestwich, S.; O’Sullivan, B., Dead-end elimination for weighted CSP, (Principles and Practice of Constraint Programming-CP 2013, (2013), Springer)
[35] Globerson, A.; Jaakkola, T. S., Fixing MAX-product: convergent message passing algorithms for map lp-relaxations, (Advances in Neural Information Processing Systems, (2007)), 553-560
[36] Goemans, M. X.; Williamson, D. P., Improved approximation algorithms for maximum cut and satisfiability problems using semidefinite programming, J. ACM, 42, 1115-1145, (1995) · Zbl 0885.68088
[37] Goldstein, R. F., Efficient rotamer elimination applied to protein side-chains and related spin glasses, Biophys. J., 66, 1335-1340, (1994)
[38] Gront, D.; Kulp, D. W.; Vernon, R. M.; Strauss, C. E.; Baker, D., Generalized fragment picking in rosetta: design, protocols and applications, PLoS ONE, 6, e23294, (2011)
[39] Grunwald, I.; Rischka, K.; Kast, S. M.; Scheibel, T.; Bargel, H., Mimicking biopolymers on a molecular scale: nano(bio)technology based on engineered proteins, Philos. Trans. R. Soc., Math. Phys. Eng. Sci., 367, 1727-1747, (2009)
[40] Hallen, M. A.; Keedy, D. A.; Donald, B. R., Dead-end elimination with perturbations (deeper): a provable protein design algorithm with continuous sidechain and backbone flexibility, Proteins, 81, 18-39, (2013)
[41] Harvey, W. D.; Ginsberg, M. L., Limited discrepancy search, (Proc. of the 14th IJCAI, Montréal, Canada, (1995))
[42] Hawkins, G.; Cramer, C.; Truhlar, D., Parametrized models of aqueous free energies of solvation based on pairwise descreening of solute atomic charges from a dielectric medium, J. Phys. Chem., 100, 19824-19839, (1996)
[43] Heras, F.; Larrosa, J.; Oliveras, A., Minimaxsat: an efficient weighted MAX-SAT solver, J. Artif. Intell. Res., 31, 1-32, (2008) · Zbl 1183.68578
[44] Janin, J.; Wodak, S.; Levitt, M.; Maigret, B., Conformation of amino acid side-chains in proteins, J. Mol. Biol., 125, 357-386, (1978)
[45] Khalil, A. S.; Collins, J. J., Synthetic biology: applications come of age, Nat. Rev. Genet., 11, 367-379, (2010)
[46] Khare, S. D.; Kipnis, Y.; Greisen, P.; Takeuchi, R.; Ashani, Y.; Goldsmith, M.; Song, Y.; Gallaher, J. L.; Silman, I.; Leader, H.; Sussman, J. L.; Stoddard, B. L.; Tawfik, D. S.; Baker, D., Computational redesign of a mononuclear zinc metalloenzyme for organophosphate hydrolysis, Nat. Chem. Biol., 8, 294-300, (2012)
[47] Khoury, G. A.; Smadbeck, J.; Kieslich, C. A.; Floudas, C. A., Protein folding and de novo protein design for biotechnological applications, Trends Biotechnol., 32, 99-109, (2014)
[48] Kingsford, C. L.; Chazelle, B.; Singh, M., Solving and analyzing side-chain positioning problems using linear and integer programming, Bioinformatics (Oxford), 21, 1028-1036, (2005)
[49] Koller, D.; Friedman, N., Probabilistic graphical models: principles and techniques, (2009), The MIT Press
[50] Korf, R. E., Depth first iterative deepening: an optimal admissible tree search, Artif. Intell., 27, 97-109, (1985) · Zbl 0573.68030
[51] Koster, A.; van Hoesel, S.; Kolen, A., Solving frequency assignment problems via tree-decomposition, (1999), Universiteit Maastricht Maastricht, The Netherlands, Technical report RM/99/011 · Zbl 1038.90086
[52] Kuegel, A., Improved exact solver for the weighted MAX-SAT problem, (Workshop Pragmatics of SAT, (2010))
[53] Kuhlman, B.; Baker, D., Native protein sequences are close to optimal for their structures, Proc. Natl. Acad. Sci. USA, 97, 10383-10388, (2000)
[54] Larrosa, J., On arc and node consistency in weighted CSP, (Proc. AAAI’02, Edmondton, CA, (2002)), 48-53
[55] Larrosa, J.; de Givry, S.; Heras, F.; Zytnicki, M., Existential arc consistency: getting closer to full arc consistency in weighted csps, (Proc. of the 19th IJCAI, Edinburgh, Scotland, (2005)), 84-89
[56] Larrosa, J.; Meseguer, P.; Schiex, T.; Verfaillie, G., Reversible DAC and other improvements for solving MAX-CSP, (Proc. of AAAI’98, Madison, WI, (1998))
[57] Larrosa, J.; Schiex, T., Solving weighted CSP by maintaining arc consistency, Artif. Intell., 159, 1-26, (2004) · Zbl 1086.68592
[58] Leach, A. R.; Lemon, A. P., Exploring the conformational space of protein side chains using dead-end elimination and the A* algorithm, Proteins, 33, 227-239, (1998)
[59] Lecoutre, C.; Roussel, O.; Dehani, D. E., WCSP integration of soft neighborhood substitutability, (Principles and Practice of Constraint Programming, (2012), Springer), 406-421
[60] Lecoutre, C.; Saïs, L.; Tabary, S.; Vidal, V., Reasoning from last conflict(s) in constraint programming, Artif. Intell., 173, 1592, (2009), 1614 · Zbl 1185.68645
[61] Lewis, J. C.; Bastian, S.; Bennett, C. S.; Fu, Y.; Mitsuda, Y.; Chen, M. M.; Greenberg, W. A.; Wong, C. H.; Arnold, F. H., Chemoenzymatic elaboration of monosaccharides using engineered cytochrome p450bm3 demethylases, Proc. Natl. Acad. Sci., 106, 16550-16555, (2009)
[62] Linderoth, J.; Savelsbergh, M., A computational study of search strategies for mixed integer programming, INFORMS J. Comput., 11, 173-187, (1999) · Zbl 1040.90535
[63] Looger, L. L.; Hellinga, H. W., Generalized dead-end elimination algorithms make large-scale protein side-chain structure prediction tractable: implications for protein design and structural genomics, J. Mol. Biol., 307, 429-445, (2001)
[64] Lovell, S. C.; Word, J. M.; Richardson, J. S.; Richardson, D. C., The penultimate rotamer library, Proteins, 40, 389-408, (2000)
[65] Marriott, K.; Nethercote, N.; Rafeh, R.; Stuckey, P.; de la Banda, M. G.; Wallace, M., The design of the zinc modelling language, Constraints, 13, 229-267, (2008) · Zbl 1146.68352
[66] Martin, V. J.; Pitera, D. J.; Withers, S. T.; Newman, J. D.; Keasling, J. D., Engineering a mevalonate pathway in Escherichia coli for production of terpenoids, Nat. Biotechnol., 21, 796-802, (2003)
[67] Nestl, B. M.; Nebel, B. A.; Hauer, B., Recent progress in industrial biocatalysis, Curr. Opin. Chem. Biol., 15, 187-193, (2011)
[68] Niedermeier, R.; Rossmanith, P., New upper bounds for maximum satisfiability, J. Algorithms, 36, 63-88, (2000) · Zbl 0959.68049
[69] Otten, L.; Ihler, A.; Kask, K.; Dechter, R., Winning the Pascal 2011 map challenge with enhanced AND/OR branch-and-bound, (DISCML’12 Workshop, at NIPS’12, Lake Tahoe, NV, USA, (2012))
[70] Pabo, C., Molecular technology. designing proteins and peptides, Nature, 301, 200, (1983)
[71] Peisajovich, S. G.; Tawfik, D. S., Protein engineers turned evolutionists, Nat. Methods, 4, 991-994, (2007)
[72] Petit, T.; Régin, J.; Bessière, C., Meta constraints on violations for over constrained problems, (Proceedings of IEEE ICTAI’2000, Vancouver, BC, Canada, (2000)), 358-365
[73] Pierce, N.; Spriet, J.; Desmet, J.; Mayo, S., Conformational splitting: a more powerful criterion for dead-end elimination, J. Comput. Chem., 21, 999-1009, (2000)
[74] Pierce, N. A.; Winfree, E., Protein design is NP-hard, Protein Eng., 15, 779-782, (2002)
[75] Pleiss, J., Protein design in metabolic engineering and synthetic biology, Curr. Opin. Biotechnol., 22, 611-617, (2011)
[76] Raghavendra, P., Optimal algorithms and inapproximability results for every CSP?, (Proceedings of the 40th Annual ACM Symposium on Theory of Computing, (2008), ACM), 245-254 · Zbl 1231.68142
[77] Raha, K.; Wollacott, A. M.; Italia, M. J.; Desjarlais, J. R., Prediction of amino acid sequence from structure, Protein Sci., 9, 1106-1119, (2000)
[78] Rendl, F.; Rinaldi, G.; Wiegele, A., Solving MAX-cut to optimality by intersecting semidefinite and polyhedral relaxations, Math. Program., 121, 307, (2010) · Zbl 1184.90118
[79] Schiex, T., Arc consistency for soft constraints, (Principles and Practice of Constraint Programming-CP 2000, Singapore, (2000)), 411-424 · Zbl 1044.68797
[80] Sontag, D.; Choe, D. K.; Li, Y., Efficiently searching for frustrated cycles in MAP inference, (Proceedings of the Twenty-Eighth Conference on Uncertainty in Artificial Intelligence (UAI-12), (2012), AUAI Press Corvallis, Oregon), 795-804
[81] Sontag, D.; Meltzer, T.; Globerson, A.; Weiss, Y.; Jaakkola, T., Tightening LP relaxations for MAP using message-passing, (24th Conference in Uncertainty in Artificial Intelligence, (2008), AUAI Press), 503-510
[82] Swain, M.; Kemp, G., A CLP approach to the protein side-chain placement problem, (Principles and Practice of Constraint Programming-CP 2001, (2001), Springer), 479-493 · Zbl 1067.68676
[83] Traoré, S.; Allouche, D.; André, I.; de Givry, S.; Katsirelos, G.; Schiex, T.; Barbe, S., A new framework for computational protein design through cost function network optimization, Bioinformatics, 29, 2129-2136, (2013)
[84] Voigt, C. A.; Gordon, D. B.; Mayo, S. L., Trading accuracy for speed: a quantitative comparison of search algorithms in protein sequence design, J. Mol. Biol., 299, 789-803, (2000)
[85] Wallace, R., Directed arc consistency preprocessing, (Meyer, M., Selected Papers from the ECAI-94 Workshop on Constraint Processing, Lect. Notes Comput. Sci., vol. 923, (1995), Springer Berlin), 121-137
[86] Yanover, C.; Meltzer, T.; Weiss, Y., Linear programming relaxations and belief propagation-an empirical study, J. Mach. Learn. Res., 7, 1887-1907, (2006) · Zbl 1222.90033
[87] Zhang, P. Y.; Romero, D. A.; Beck, J. C.; Amon, C. H., Solving wind farm layout optimization with mixed integer programming and constraint programming, (Integration of AI and OR Techniques in Constraint Programming for Combinatorial Optimization Problems (CP-AI-OR), (2013), Springer), 284-299 · Zbl 1382.90050
This reference list is based on information provided by the publisher or from digital mathematics libraries. Its items are heuristically matched to zbMATH identifiers and may contain data conversion errors. It attempts to reflect the references listed in the original paper as accurately as possible without claiming the completeness or perfect precision of the matching.