zbMATH — the first resource for mathematics

Strong mixed-integer programming formulations for trained neural networks. (English) Zbl 1450.90014
Summary: We present strong mixed-integer programming (MIP) formulations for high-dimensional piecewise linear functions that correspond to trained neural networks. These formulations can be used for a number of important tasks, such as verifying that an image classification network is robust to adversarial inputs, or solving decision problems where the objective function is a machine learning model. We present a generic framework, which may be of independent interest, that provides a way to construct sharp or ideal formulations for the maximum of \(d\) affine functions over arbitrary polyhedral input domains. We apply this result to derive MIP formulations for a number of the most popular nonlinear operations (e.g. ReLU and max pooling) that are strictly stronger than other approaches from the literature. We corroborate this computationally, showing that our formulations are able to offer substantial improvements in solve time on verification tasks for image classification networks.
90C11 Mixed integer programming
Full Text: DOI
[1] https://developers.google.com/machine-learning/glossary/#logits. Accessed 6 Feb 2020
[2] Alipanahi, B.; Delong, A.; Weirauch, MT; Frey, BJ, Predicting the sequence specificities of DNA- and RNA-binding proteins by deep learning, Nat. Biotechnol., 33, 831-838 (2015)
[3] Amos, B., Xu, L., Kolter, J.Z.: Input convex neural networks. In: Precup, D., Teh, Y.W. (eds.) Proceedings of the 34th International Conference on Machine Learning, vol. 70, pp. 146-155. PMLR, International Convention Centre, Sydney (2017)
[4] Anderson, R., Huchette, J., Tjandraatmadja, C., Vielma, J.P.: Strong mixed-integer programming formulations for trained neural networks. In: A. Lodi, V. Nagarajan (eds.) Proceedings of the 20th Conference on Integer Programming and Combinatorial Optimization, pp. 27-42. Springer International Publishing, Cham (2019). arxiv:1811.08359 · Zbl 1436.90088
[5] Arora, R., Basu, A., Mianjy, P., Mukherjee, A.: Understanding deep neural networks with rectified linear units (2016). arXiv preprint arXiv:1611.01491
[6] Arulkumaran, K.; Deisenroth, MP; Brundage, M.; Bharath, AA, Deep reinforcement learning: a brief survey, IEEE Signal Process. Mag., 34, 6, 26-38 (2017)
[7] Atamtürk, A.; Gómez, A., Strong formulations for quadratic optimization with M-matrices and indicator variables, Math. Program., 170, 141-176 (2018) · Zbl 1391.90423
[8] Balas, E., Disjunctive programming and a hierarchy of relaxations for discrete optimization problems, SIAM J. Algorithmic Discrete Methods, 6, 3, 466-486 (1985) · Zbl 0592.90070
[9] Balas, E., Disjunctive programming: properties of the convex hull of feasible points, Discrete Appl. Math., 89, 3-44 (1998) · Zbl 0921.90118
[10] Bartolini, A., Lombardi, M., Milano, M., Benini, L.: Neuron constraints to model complex real-world problems. In: International Conference on the Principles and Practice of Constraint Programming, pp. 115-129. Springer, Berlin (2011)
[11] Bartolini, A., Lombardi, M., Milano, M., Benini, L.: Optimization and controlled systems: a case study on thermal aware workload dispatching. In: Proceedings of the Twenty-Sixth AAAI Conference on Artificial Intelligence, pp. 427-433 (2012)
[12] Bastani, O., Ioannou, Y., Lampropoulos, L., Vytiniotis, D., Nori, A.V., Criminisi, A.: Measuring neural net robustness with constraints. In: Advances in Neural Information Processing Systems, pp. 2613-2621 (2016)
[13] Belotti, P.; Bonami, P.; Fischetti, M.; Lodi, A.; Monaci, M.; Nogales-Gomez, A.; Salvagnin, D., On handling indicator constraints in mixed integer programming, Comput. Optim. Appl., 65, 3, 545-566 (2016) · Zbl 1357.90094
[14] Bertsimas, D.; Tsitsiklis, J., Introduction to Linear Optimization (1997), Belmont, MA: Athena Scientific, Belmont, MA
[15] Bienstock, D., Muñoz, G., Pokutta, S.: Principled deep neural network training through linear programming (2018). arXiv preprint arXiv:1810.03218
[16] Bishop, CM, Pattern Recognition and Machine Learning (2006), Berlin: Springer, Berlin
[17] Bonami, P.; Lodi, A.; Tramontani, A.; Wiese, S., On mathematical programming with indicator constraints, Math. Program., 151, 1, 191-223 (2015) · Zbl 1328.90086
[18] Boureau, Y.L., Bach, F., LeCun, Y., Ponce, J.: Learning mid-level features for recognition. In: IEEE Computer Society Conference on Computer Vision and Pattern Recognition, pp. 2559-2566 (2010)
[19] Bunel, R., Turkaslan, I., Torr, P.H., Kohli, P., Kumar, M.P.: A unified view of piecewise linear neural network verification. In: Advances in Neural Information Processing Systems (2018)
[20] Carlini, N., Wagner, D.: Towards evaluating the robustness of neural networks. In: 2017 IEEE Symposium on Security and Privacy (SP), pp. 39-57 (2017)
[21] Chen, L., Ma, W., Natarajan, K., Simchi-Levi, D., Yan, Z.: Distributionally robust linear and discrete optimization with marginals. Available at SSRN 3159473 (2018)
[22] Cheng, C.H., Nührenberg, G., Ruess, N.: Maximum resilience of artifical neural networks. In: International Symposium on Automated Technology for Verification and Analysis. Springer, Cham (2017)
[23] Dulac-Arnold, G., Evans, R., van Hasselt, H., Sunehag, P., Lillicrap, T., Hunt, J., Mann, T., Weber, T., Degris, T., Coppin, B.: Deep reinforcement learning in large discrete action spaces (2015). arxiv:1512.07679
[24] Dutta, S., Jha, S., Sanakaranarayanan, S., Tiwari, A.: Output range analysis for deep feedforward neural networks. In: NASA Formal Methods Symposium (2018)
[25] Dvijotham, K., Gowal, S., Stanforth, R., Arandjelovic, R., O’Donoghue, B., Uesato, J., Kohli, P.: Training verified learners with learned verifiers (2018). arxiv:1805.10265
[26] Dvijotham, K., Stanforth, R., Gowal, S., Mann, T., Kohli, P.: A dual approach to scalable verification of deep networks. In: Thirty-Fourth Conference Annual Conference on Uncertainty in Artificial Intelligence (2018)
[27] Ehlers, R.: Formal verification of piece-wise linear feed-forward neural networks. In: International Symposium on Automated Technology for Verification and Analysis. Springer, Cham (2017)
[28] Engstrom, L., Tran, B., Tsipras, D., Schmidt, L., Madry, A.: Exploring the landscape of spatial robustness. In: Chaudhuri, K., Salakhutdinov, R. (eds.) Proceedings of the 36th International Conference on Machine Learning, Proceedings of Machine Learning Research, vol. 97, pp. 1802-1811. PMLR, Long Beach, CA (2019). http://proceedings.mlr.press/v97/engstrom19a.html. Accessed 6 Feb 2020
[29] Fischetti, M.; Jo, J., Deep neural networks and mixed integer linear optimization, Constraints, 23, 296-309 (2018) · Zbl 1402.90096
[30] Gatys, L.A., Ecker, A.S., Bethge, M.: A neural algorithm of artistic style (2015). arxiv:1508.06576
[31] Glorot, X., Bordes, A., Bengio, Y.: Deep sparse rectifier neural networks. In: Proceedings of the Fourteenth International Conference on Artificial Intelligence and Statistics, pp. 315-323 (2011)
[32] Goodfellow, I.; Bengio, Y.; Courville, A., Deep Learning (2016), Cambridge: MIT Press, Cambridge · Zbl 1373.68009
[33] Goodfellow, I.J., Warde-Farley, D., Mirza, M., Courville, A., Bengio, Y.: Maxout networks. In: Proceedings of the 30th International Conference on Machine Learning, vol. 28, pp. 1319-1327 (2013)
[34] Grimstad, B.; Andersson, H., ReLU networks as surrogate models in mixed-integer linear programs, Comput. Chem. Eng., 131, 106580 (2019)
[35] Haneveld, W.K.K.: Robustness against dependence in pert: an application of duality and distributions with known marginals. In: Stochastic Programming 84 Part I, pp. 153-182. Springer (1986) · Zbl 0592.90048
[36] Hanin, B.: Universal function approximation by deep neural nets with bounded width and ReLU activations (2017). arXiv preprint arXiv:1708.02691
[37] Hijazi, H.; Bonami, P.; Cornuéjols, G.; Ouorou, A., Mixed-integer nonlinear programs featuring “on/off” constraints, Comput. Optim. Appl., 52, 2, 537-558 (2012) · Zbl 1250.90058
[38] Hijazi, H., Bonami, P., Ouorou, A.: A note on linear on/off constraints (2014). http://www.optimization-online.org/DB_FILE/2014/04/4309.pdf. Accessed 6 Feb 2020
[39] Huber, B.; Rambau, J.; Santos, F., The Cayley Trick, lifting subdivisions and the Bohne-Dress theorem of zonotopal tiltings, J. Eur. Math. Soc., 2, 2, 179-198 (2000) · Zbl 0988.52017
[40] Huchette, J.: Advanced mixed-integer programming formulations: methodology, computation, and application. Ph.D. thesis, Massachusetts Institute of Technology (2018)
[41] Jeroslow, R.; Lowe, J., Modelling with integer variables, Math. Program. Study, 22, 167-184 (1984) · Zbl 0554.90081
[42] Jeroslow, RG, Alternative formulations of mixed integer programs, Ann. Oper. Res., 12, 241-276 (1988)
[43] Katz, G., Barrett, C., Dill, D.L., Julian, K., Kochenderfer, M.J.: Reluplex: an efficient SMT solver for verifying deep neural networks. In: International Conference on Computer Aided Verification, pp. 97-117 (2017)
[44] Khalil, E.B., Gupta, A., Dilkina, B.: Combinatorial attacks on binarized neural networks. In: International Conference on Learning Representations (2019)
[45] Kingma, D.P., Ba, J.: Adam: A method for stochastic optimization (2014). arxiv:1412.6980
[46] Korte, B.; Vygen, J., Combinatorial Optimization: Theory and Algorithms (2000), Berlin: Springer, Berlin · Zbl 0953.90052
[47] Kumar, A., Serra, T., Ramalingam, S.: Equivalent and approximate transformations of deep neural networks (2019). arxiv:1905.11428
[48] LeCun, Y.; Bengio, Y.; Hinton, G., Deep learning, Nature, 521, 7553, 436-444 (2015)
[49] LeCun, Y.; Bottou, L.; Bengio, Y.; Haffner, P., Gradient-based learning applied to document recognition, Proc. IEEE, 86, 2278-2324 (1998)
[50] Liu, C., Arnon, T., Lazarus, C., Barrett, C., Kochenderfer, M.J.: Algorithms for verifying deep neural networks (2019). arxiv:1903.06758
[51] Lombardi, M.; Gualandi, S., A lagrangian propagator for artificial neural networks in constraint programming, Constraints, 21, 4, 435-462 (2016) · Zbl 1368.90148
[52] Lombardi, M., Milano, M.: Boosting combinatorial problem modeling with machine learning. In: Proceedings IJCAI, pp. 5472-5478 (2018)
[53] Lombardi, M.; Milano, M.; Bartolini, A., Empirical decision model learning, Artif. Intell., 244, 343-367 (2017) · Zbl 1404.68113
[54] Lomuscio, A., Maganti, L.: An approach to reachability analysis for feed-forward ReLU neural networks (2017). arxiv:1706.07351
[55] Maas, A.L., Hannun, A.Y., Ng, A.Y.: Rectifier nonlinearities improve neural network acoustic models. In: ICML Workshop on Deep Learning for Audio, Speech and Language (2013)
[56] Mladenov, M., Boutilier, C., Schuurmans, D., Elidan, G., Meshi, O., Lu, T.: Approximate linear programming for logistic Markov decision processes. In: Proceedings of the Twenty-sixth International Joint Conference on Artificial Intelligence (IJCAI-17), pp. 2486-2493. Melbourne, Australia (2017)
[57] Mordvintsev, A., Olah, C., Tyka, M.: Inceptionism: Going deeper into neural networks (2015). https://ai.googleblog.com/2015/06/inceptionism-going-deeper-into-neural.html. Accessed 6 Feb 2020
[58] Natarajan, K.; Song, M.; Teo, CP, Persistency model and its applications in choice modeling, Manage. Sci., 55, 3, 453-469 (2009) · Zbl 1232.91139
[59] Olah, C., Mordvintsev, A., Schubert, L.: Feature visualization. Distill (2017). https://distill.pub/2017/feature-visualization. Accessed 6 Feb 2020
[60] Papernot, N., McDaniel, P., Jha, S., Fredrikson, M., Celik, Z.B., Swami, A.: The limitations of deep learning in adversarial settings. In: IEEE European Symposium on Security and Privacy, pp. 372-387 (2016)
[61] Raghunathan, A., Steinhardt, J., Liang, P.: Semidefinite relaxations for certifying robustness to adversarial examples. In: Proceedings of the 32nd International Conference on Neural Information Processing Systems, NIPS’18, pp. 10,900-10,910. Curran Associates Inc. (2018)
[62] Ryu, M., Chow, Y., Anderson, R., Tjandraatmadja, C., Boutilier, C.: CAQL: Continuous action Q-learning (2019). arxiv:1909.12397
[63] Salman, H., Yang, G., Zhang, H., Hsieh, C.J., Zhang, P.: A convex relaxation barrier to tight robustness verification of neural networks (2019). arxiv:1902.08722
[64] Say, B., Wu, G., Zhou, Y.Q., Sanner, S.: Nonlinear hybrid planning with deep net learned transition models and mixed-integer linear programming. In: Proceedings of the Twenty-Sixth International Joint Conference on Artificial Intelligence, IJCAI-17, pp. 750-756 (2017)
[65] Schweidtmann, AM; Mitsos, A., Global deterministic optimization with artificial neural networks embedded, J. Optim. Theory Appl., 180, 925-948 (2019) · Zbl 1407.90263
[66] Serra, T., Ramalingam, S.: Empirical bounds on linear regions of deep rectifier networks (2018). arxiv:1810.03370
[67] Serra, T., Tjandraatmadja, C., Ramalingam, S.: Bounding and counting linear regions of deep neural networks. In: Thirty-Fifth International Conference on Machine Learning (2018)
[68] Szegedy, C., Zaremba, W., Sutskever, I., Bruna, J., Erhan, D., Goodfellow, I., Fergus, R.: Intriguing properties of neural networks. In: International Conference on Learning Representations (2014)
[69] Tawarmalani, M.; Sahinidis, N., Convexification and Global Optimization in Continuous and Mixed-Integer Nonlinear Programming: Theory, Algorithms, Software and Applications (2002), Berlin: Springer, Berlin · Zbl 1031.90022
[70] Tjeng, V., Xiao, K., Tedrake, R.: Verifying neural networks with mixed integer programming. In: International Conference on Learning Representations (2019)
[71] Trespalacios, F.; Grossmann, IE, Improved big-M reformulation for generalized disjunctive programs, Comput. Chem. Eng., 76, 98-103 (2015)
[72] Vielma, JP, Mixed integer linear programming formulation techniques, SIAM Rev., 57, 1, 3-57 (2015) · Zbl 1338.90277
[73] Vielma, JP, Embedding formulations and complexity for unions of polyhedra, Manage. Sci., 64, 10, 4471-4965 (2018)
[74] Vielma, JP, Small and strong formulations for unions of convex sets from the Cayley embedding, Math. Program., 177, 21-53 (2018) · Zbl 1418.90180
[75] Vielma, JP; Nemhauser, G., Modeling disjunctive constraints with a logarithmic number of binary variables and constraints, Math. Program., 128, 1-2, 49-72 (2011) · Zbl 1218.90137
[76] Weibel, C.: Minkowski sums of polytopes: combinatorics and computation. Ph.D. thesis, École Polytechnique Fédérale de Lausanne (2007)
[77] Weiss, G., Stochastic bounds on distributions of optimal value functions with applications to pert, network flows and reliability, Oper. Res., 34, 4, 595-605 (1986) · Zbl 0609.90093
[78] Wong, E., Kolter, J.Z.: Provable defenses against adversarial examples via the convex outer adversarial polytope. In: International Conference on Machine Learning (2018)
[79] Wong, E., Schmidt, F., Metzen, J.H., Kolter, J.Z.: Scaling provable adversarial defenses. In: 32nd Conference on Neural Information Processing Systems (2018)
[80] Wu, G., Say, B., Sanner, S.: Scalable planning with Tensorflow for hybrid nonlinear domains. In: Advances in Neural Information Processing Systems, pp. 6276-6286 (2017)
[81] Xiao, K.Y., Tjeng, V., Shafiullah, N.M., Madry, A.: Training for faster adversarial robustness verification via inducing ReLU stability. In: International Conference on Learning Representations (2019)
[82] Xu, B., Wang, N., Chen, T., Li, M.: Empirical evaluation of rectified activations in convolution network (2015). arxiv:1505.00853
[83] Zeng, H.; Edwards, MD; Liu, G.; Gifford, DK, Convolutional neural network architectures for predicting DNA-protein binding, Bioinformatics, 32, 12, 121-127 (2016)
This reference list is based on information provided by the publisher or from digital mathematics libraries. Its items are heuristically matched to zbMATH identifiers and may contain data conversion errors. It attempts to reflect the references listed in the original paper as accurately as possible without claiming the completeness or perfect precision of the matching.