×

Parseval proximal neural networks. (English) Zbl 1489.68224

Summary: The aim of this paper is twofold. First, we show that a certain concatenation of a proximity operator with an affine operator is again a proximity operator on a suitable Hilbert space. Second, we use our findings to establish so-called proximal neural networks (PNNs) and stable tight frame proximal neural networks. Let \(\mathcal{H}\) and \(\mathcal{K}\) be real Hilbert spaces, \(b \in \mathcal{K}\) and \(T \in \mathcal{B} (\mathcal{H},\mathcal{K})\) a linear operator with closed range and Moore-Penrose inverse \(T^\dagger\). Based on the well-known characterization of proximity operators by Moreau, we prove that for any proximity operator \(\mathrm{Prox}:\mathcal{K}\rightarrow \mathcal{K}\) the operator \(T^\dagger \mathrm{Prox}(T\cdot +b)\) is a proximity operator on \(\mathcal{H}\) equipped with a suitable norm. In particular, it follows for the frequently applied soft shrinkage operator \(\mathrm{Prox}= S_\lambda :\ell_2 \rightarrow \ell_2\) and any frame analysis operator \(T:\mathcal{H}\rightarrow \ell_2\) that the frame shrinkage operator \(T^\dagger S_\lambda T\) is a proximity operator on a suitable Hilbert space. The concatenation of proximity operators on \(\mathbb{R}^d\) equipped with different norms establishes a PNN. If the network arises from tight frame analysis or synthesis operators, then it forms an averaged operator. In particular, it has Lipschitz constant 1 and belongs to the class of so-called Lipschitz networks, which were recently applied to defend against adversarial attacks. Moreover, due to its averaging property, PNNs can be used within so-called Plug-and-Play algorithms with convergence guarantee. In case of Parseval frames, we call the networks Parseval proximal neural networks (PPNNs). Then, the involved linear operators are in a Stiefel manifold and corresponding minimization methods can be applied for training of such networks. Finally, some proof-of-the concept examples demonstrate the performance of PPNNs.

MSC:

68T05 Learning and adaptive systems in artificial intelligence
90C26 Nonconvex programming, global optimization
PDFBibTeX XMLCite
Full Text: DOI arXiv

References:

[1] Absil, P-A; Mahony, R.; Sepulchre, R., Optimization Algorithms on Matrix Manifolds (2008), Princeton and Oxford: Princeton University Press, Princeton and Oxford · Zbl 1147.65043
[2] Anil, C., Lucas, J., Grosse, R.: Sorting out Lipschitz function approximation. In: Chaudhuri, K., Salakhutdinov, R., editors, Proceedings of the 36th International Conference on Machine Learning, vol. 97 of Proceedings of Machine Learning Research, pp. 291-301, Long Beach, California, USA. PMLR (2019)
[3] Arjovsky, M., Shah, A., Bengio, Y. Unitary evolution recurrent neural networks. In: International Conference on Machine Learning, pp. 1120-1128 (2016)
[4] Bansal, N., Chen, X., Wang, Z.: Can we gain more from orthogonality regularizations in training deep networks? In: Advances in Neural Information Processing Systems, pp. 4261-4271 (2018)
[5] Bauschke, HH; Combettes, PL, Convex Analysis and Monotone Operator Theory in Hilbert Spaces (2011), New York: Springer, New York · Zbl 1218.47001
[6] Beck, A.: First-Order Methods in Optimization, vol. 25 of MOS-SIAM Series on Optimization. SIAM (2017) · Zbl 1384.65033
[7] Bertsekas, DP, Incremental proximal methods for large scale convex optimization, Math. Program., 129, 163-195 (2011) · Zbl 1229.90121 · doi:10.1007/s10107-011-0472-0
[8] Burger, M.; Sawatzky, A.; Steidl, G., First Order Algorithms in Variational Image Processing (2017), New York: Springer, New York · Zbl 1372.65053
[9] Chan, SH; Wang, X.; Elgendy, OA, Plug-and-play ADMM for image restoration: fixed-point convergence and applications, IEEE Trans. Comput. Imaging, 3, 84-98 (2016) · doi:10.1109/TCI.2016.2629286
[10] Chouzenoux, E.; Pesquet, J-C; Repetti, A., Variable metric forward-backward algorithm for minimizing the sum of a differentiable function and a convex function, J. Optim. Theory Appl., 162, 107-132 (2014) · Zbl 1318.90058 · doi:10.1007/s10957-013-0465-7
[11] Christensen, O., An Introduction to Frames and Riesz Bases (2016), New York: Springer, New York · Zbl 1348.42033
[12] Combettes, PL, Monotone operator theory in convex optimization, Math. Program., 170, 1, 177-206 (2018) · Zbl 1471.47033 · doi:10.1007/s10107-018-1303-3
[13] Combettes, P.L., Pesquet, J.-C.: Deep neural network structures solving variational inequalities. In: Set-Valued and Variational Analysis, pp. 1-28 (2020)
[14] Combettes, PL; Wajs, VR, Signal recovery by proximal forward-backward splitting, Multisc. Model. Simul., 4, 1168-1200 (2005) · Zbl 1179.94031 · doi:10.1137/050626090
[15] Cvetković, Z.; Vetterli, M., Oversampled filter banks, IEEE Trans. Signal Process., 46, 1245-1255 (1998) · doi:10.1109/78.668788
[16] Daubechies, I.; Defrise, M.; De Mol, C., An iterative thresholding algorithm for linear inverse problems with a sparsity constraint, Commun. Pure Appl. Math., 57, 11, 1413-1457 (2004) · Zbl 1077.65055 · doi:10.1002/cpa.20042
[17] Dorobantu, V., Stromhaug, P.A., Renteria, J.: DIZZYRNN: reparameterizing recurrent neural networks for norm-preserving backpropagation. In: CoRR arXiv:1612.04035 (2016)
[18] Elad, M.; Aharon, M., Image denoising via sparse and redundant representations over learned dictionaries, IEEE Trans. Image Process., 15, 12, 3736-3745 (2006) · doi:10.1109/TIP.2006.881969
[19] Frerix, T., Möllenhoff, T., Moeller, M., Cremers, D.: Proximal backpropagation. Technical report. ArXiv Preprint arXiv:1706.04638 (2018)
[20] Geppert, J.A., Plonka, G.: Frame soft shrinkage operators are proximity operators. Technical report, arXiv preprint arXiv:1910.01820 (2019)
[21] Golub, GH; Loan, CFV, Matrix Computations (2013), Baltimore: The Johns Hopkins University Press, Baltimore · Zbl 1268.65037
[22] Goodfellow, J., Shlens, J., Szegedy, C.: Expalining and harnessing adversarial examples. In: International Conference on Learning Representations (2015)
[23] Gouk, H., Frank, E., Pfahringer, B., Cree, M.: Regularisation of neural networks by enforcing Lipschitz continuity. arXiv:1804.04368 (2018)
[24] Gribonval, R., Nikolova, M.: A characterization of proximity operators. arXiv:1807.04014 (2020)
[25] Harandi, M., Fernando, B.: Generalized backpropagation, etude de cas: orthogonality. In CoRR abs/1611.05927 (2016)
[26] Huang, L., Liu, X., Lang, B., Yu, A.W., Wang, Y., Li, B.: Orthogonal weight normalization: solution to optimization over multiple dependent Stiefel manifolds in deep neural networks. In: Thirty-Second AAAI Conference on Artificial Intelligence (2018)
[27] Huster, T.P., Chiang, C.-Y.J., Chadha, R.: Limitations of the Lipschitz constant as a defense against adversarial examples. In: ECML PKDD 2018 Workshops, pp. 16-29. Springer, New York (2019)
[28] Jing, L., Shen, Y., Dubcek, T., Peurifoy, J., Skirlo, S., LeCun, Y., Tegmark, M., Soljačić, M.: Tunable efficient unitary neural networks (EUNN) and their application to RNNs. In: Proceedings of the 34th International Conference on Machine Learning-Vol. 70, pp. 1733-1741. JMLR. org (2017)
[29] Kobler, E., Klatzer, T., Hammernik, K., Pock, T.: Variational networks: connecting variational methods and deep learning. In: German conference on pattern recognition, pp. 281-293. Springer, New York (2017)
[30] Lerman, G.; Maunu, T., An overview of robust subspace recovery, Proc. IEEE, 106, 8, 1380-1410 (2018) · doi:10.1109/JPROC.2018.2853141
[31] Lezcano-Casado, M., Martínez-Rubio, D.: Cheap orthogonal constraints in neural networks: A simple parametrization of the orthogonal and unitary group. arXiv:1901.08428 3794-3803, (2019)
[32] Mallat, S., A Wavelet Tour of Signal Processing: The Sparse Way (2008), Amsterdam: Elsevier, Amsterdam
[33] Miyato, T., Kataoka, T., Koyama, M., Yoshida, Y.: Spectral normalization for generative adversarial networks. In: International Conference on Learning Representations (2018)
[34] Moreau, J-J, Proximité et dualité dans un espace Hilbertien, Bull. Soc. Math. France, 93, 273-299 (1965) · Zbl 0136.12101 · doi:10.24033/bsmf.1625
[35] Neumayer, S.; Nimmer, M.; Setzer, S.; Steidl, G., On the rotational invariant \(l_1\)-norm PCA, Linear Algeb. Appl., 587, 243-270 (2019) · Zbl 07191020 · doi:10.1016/j.laa.2019.10.030
[36] Nishimori, Y.; Akaho, S., Learning algorithms utilizing quasi-geodesic flows on the Stiefel manifold, Neurocomputing, 67, 106-135 (2005) · doi:10.1016/j.neucom.2004.11.035
[37] Plonka, G.; Steidl, G., A multiscale wavelet-inspired scheme for nonlinear diffusion, Int. J. Wavel. Multiresol. Inf. Process., 4, 1, 1-21 (2006) · Zbl 1111.65075 · doi:10.1142/S0219691306001063
[38] Reich, S., Weak convergence theorems for nonexpansive mappings in Banach spaces, J. Math. Anal. Appl., 67, 274-276 (1979) · Zbl 0423.47026 · doi:10.1016/0022-247X(79)90024-6
[39] Sedghi, H., Gupta, V., Long, P.M.: The singular values of convolutional layers. In: International Conference on Learning Representations (2019)
[40] Setzer, S., Operator splittings, Bregman methods and frame shrinkage in image processing, Int. J. Comput. Vis., 92, 3, 265-280 (2011) · Zbl 1235.68314 · doi:10.1007/s11263-010-0357-3
[41] Sommerhoff, H., Kolb, A., Moeller, M.: Energy dissipation with plug-and-play priors. In: NeurIPS 2019 Workshop (2019)
[42] Sreehariand, S.; Venkatakrishnan, SV; Wohlberg, B., Plug-and-play priors for bright field electron tomography and sparse interpolation, IEEE Trans. Comput. Imaging, 2, 408-423 (2016)
[43] Steidl, G.; Weickert, J.; Brox, T.; Mrázek, P.; Welk, M., On the equivalence of soft wavelet shrinkage, total variation diffusion, total variation regularization, and sides, SIAM Journal on Numerical Analysis, 42, 2, 686-713 (2004) · Zbl 1083.94001 · doi:10.1137/S0036142903422429
[44] Sun, Y.; Wohlberg, B.; Kamilov, U., An online plug-and-play algorithm for regularized image reconstruction, IEEE Trans. Comput. Imaging, 5, 395-408 (2018) · doi:10.1109/TCI.2019.2893568
[45] Teodoro, AM; Bioucas-Dias, JM; Figueiredo, MA, A convergent image fusion algorithm using scene-adapted Gaussian-mixture-based denoising, IEEE Trans. Image Process., 28, 1, 451-463 (2018) · Zbl 1409.94587 · doi:10.1109/TIP.2018.2869727
[46] Tsuzuku, Y.; Sato, I.; Sugiyama, M., Lipschitz-margin training: scalable certification of perturbation invariance for deep neural networks, Adv. Neural Inf. Process. Syst., 31, 6541-6550 (2018)
[47] Vorontsov, E., Trabelsi, C., Kadoury, S., Pal, C.: On orthogonality and learning recurrent networks with long term dependencies. In: Proceedings of the 34th International Conference on Machine Learning-Volume 70, pp. 3570-3578. JMLR.org (2017)
[48] Wen, Z.; Yin, W., A feasible method for optimization with orthogonality constraints, Math.Program., 142, 1-2, 397-434 (2013) · Zbl 1281.49030 · doi:10.1007/s10107-012-0584-1
[49] Wisdom, S., Powers, T., Hershey, J., Le Roux, J., Atlas, L.: Full-capacity unitary recurrent neural networks. In: Advances in Neural Information Processing Systems, pp. 4880-4888 (2016)
This reference list is based on information provided by the publisher or from digital mathematics libraries. Its items are heuristically matched to zbMATH identifiers and may contain data conversion errors. In some cases that data have been complemented/enhanced by data from zbMATH Open. This attempts to reflect the references listed in the original paper as accurately as possible without claiming completeness or a perfect matching.