×

Function approximation by deep networks. (English) Zbl 1442.62215

Summary: We show that deep networks are better than shallow networks at approximating functions that can be expressed as a composition of functions described by a directed acyclic graph, because the deep networks can be designed to have the same compositional structure, while a shallow network cannot exploit this knowledge. Thus, the blessing of compositionality mitigates the curse of dimensionality. On the other hand, a theorem called good propagation of errors allows to “lift” theorems about shallow networks to those about deep networks with an appropriate choice of norms, smoothness, etc. We illustrate this in three contexts where each channel in the deep network calculates a spherical polynomial, a non-smooth ReLU network, or another zonal function network related closely with the ReLU network.

MSC:

62M45 Neural nets and related approaches to inference from stochastic processes
62R20 Statistics on metric spaces
41A25 Rate of convergence, degree of approximation
68Q32 Computational learning theory
05C90 Applications of graph theory
PDFBibTeX XMLCite
Full Text: DOI arXiv

References:

[1] F. Bach, Breaking the curse of dimensionality with convex neural networks, J. Mach. Learn. Res., 18, 629-681 (2017)
[2] Y. Cho and L. K. Saul, Kernel methods for deep learning, in Advances in Neural Information Processing Systems, (2009), 342-350.
[3] C. K. Chui; X. Li; H. N. Mhaskar, Limitations of the approximation capabilities of neural networks with one hidden layer, Adv. Comput. Math., 5, 233-243 (1996) · Zbl 0855.41026
[4] C. K. Chui, S. B. Lin and D. X. Zhou, Construction of neural networks for realization of localized deep learning, Front. Appl. Math. Statist., 4 (2018).
[5] R. Eldan and O. Shamir, The power of depth for feedforward neural networks, in Conference on Learning Theory, (2016), 907-940.
[6] B. Hanin, Universal function approximation by deep neural nets with bounded width and relu activations, Mathematics, 7 (2019), Art. 992.
[7] Q. T. Le Gia; H. N. Mhaskar, Localized linear polynomial operators and quadrature formulas on the sphere, SIAM J. Numer. Anal., 47, 440-466 (2009) · Zbl 1190.65039
[8] P. Lizorkin; K. P. Rustamov, Nikol’skii-Besov spaces on the sphere in connection with approximation theory, Proc. Steklov Inst. Math. AMS Trans., 204, 149-172 (1994) · Zbl 0849.46025
[9] H. N. Mhaskar, Approximation properties of a multilayered feedforward artificial neural network, Adv. Comput. Math., 1, 61-80 (1993) · Zbl 0824.41011
[10] H. N. Mhaskar, Eignets for function approximation on manifolds, Appl. Comput. Harmon. Anal., 29, 63-87 (2010) · Zbl 1201.41003
[11] H. N. Mhaskar, Dimension independent bounds for general shallow networks, Neural Netw., 123, 142-152 (2020)
[12] H. N. Mhaskar, Function approximation with zonal function networks with activation functions analogous to the rectified linear unit functions, J. Complexity, 51, 1-19 (2019) · Zbl 1409.41008
[13] H. N. Mhaskar; T. Poggio, Deep vs. shallow networks: An approximation theory perspective, Anal. Appl., 14, 829-848 (2016) · Zbl 1355.68233
[14] R. Montufar; G. F.; Pa scanu; K. Cho; Y. Bengio, On the number of linear regions of deep neural networks, Adv. Neural Inform. Process. Syst., 27, 2924-2932 (2014)
[15] S. Pawelke, Über die Approximationsordnung bei Kugelfunktionen und algebraischen Polynomen, Tohoku Math. J. Sec. Ser., 24, 473-486 (1972) · Zbl 0243.41006
[16] I. Safran and O. Shamir, Depth separation in relu networks for approximating smooth non-linear functions, preprint, arXiv: 1610.09887.
[17] I. Safran and O. Shamir, Depth-width tradeoffs in approximating natural functions with neural networks, in Proceedings of the 34th International Conference on Machine Learning, Vol. 70, (2017), 2979-2987.
[18] T. Serra, C. Tjandraatmadja and S. Ramalingam, Bounding and counting linear regions of deep neural networks, preprint, arXiv: 1711.02114.
[19] O. Sharir and A. Shashua, On the expressive power of overlapping architectures of deep learning, preprint, arXiv: 1703.02065.
[20] M. Telgarsky, Benefits of depth in neural networks, preprint, arXiv: 1602.04485.
[21] D. Yarotsky, Error bounds for approximations with deep relu networks, Neural Netw., 94, 103-114 (2017) · Zbl 1429.68260
[22] D. Yarotsky, Optimal approximation of continuous functions by very deep relu networks, preprint, arXiv: 1802.03620. · Zbl 1429.68260
This reference list is based on information provided by the publisher or from digital mathematics libraries. Its items are heuristically matched to zbMATH identifiers and may contain data conversion errors. In some cases that data have been complemented/enhanced by data from zbMATH Open. This attempts to reflect the references listed in the original paper as accurately as possible without claiming completeness or a perfect matching.