×

Limitations of shallow nets approximation. (English) Zbl 1429.68254

Summary: In this paper, we aim at analyzing the approximation abilities of shallow networks in reproducing kernel Hilbert spaces (RKHSs). We prove that there is a probability measure such that the achievable lower bound for approximating by shallow nets can be realized for all functions in balls of reproducing kernel Hilbert space with high probability, which is different with the classical minimax approximation error estimates. This result together with the existing approximation results for deep nets shows the limitations for shallow nets and provides a theoretical explanation on why deep nets perform better than shallow nets.

MSC:

68T07 Artificial neural networks and deep learning
46E22 Hilbert spaces with reproducing kernels (= (proper) functional Hilbert spaces, including de Branges-Rovnyak and other structured spaces)

Software:

AlexNet; darch; ImageNet
PDFBibTeX XMLCite
Full Text: DOI

References:

[1] Anastassiou, G., Intelligent systems: Approximation by artificial neural networks (2011), Springer Science & Bussiness Media: Springer Science & Bussiness Media Berlin · Zbl 1243.68002
[2] Aronszajn, N., Theory of reproducing kernels, Transactions of the American Society, 68, 337-404 (1950) · Zbl 0037.20701
[5] Bengio, Y., Learning deep architectures for AI, Foundations and Trends in Machine Learning, 2, 1-127 (2009) · Zbl 1192.68503
[6] Chui, C. K.; Li, X.; Mhaskar, H. N., Neural networks for lozalized approximation, Mathematics of Computation, 63, 607-623 (1994) · Zbl 0806.41020
[7] Chui, C. K.; Li, X.; Mhaskar, H. N., Limitations of the approximation capabilities of neural networks with one hidden layer, Advances in Computational Mathematics, 5, 233-243 (1996) · Zbl 0855.41026
[9] Costarelli, D., Neural network operators: constructive interpolation of multivariate functions, Neural Networks, 67, 28-36 (2015) · Zbl 1396.41019
[10] Costarelli, D.; Vinti, G., Approximation by max-product neural network operators of Kantorovich type, Results in Mathematics, 69, 505-519 (2016) · Zbl 1355.41009
[11] Costarelli, D.; Vinti, G., Max-product neural network and quasi in- terpolation operators activated by sigmoidal functions, Journal of Approximation Theory, 209, 1-22 (2016) · Zbl 1350.41001
[12] Costarelli, D.; Vinti, G., Pointwise and uniform approximation by multivariate neural network operators of the max-product type, Neural Networks, 81, 81-90 (2016) · Zbl 1439.41009
[13] Delalleau, O.; Bengio, Y., Shallow vs. deep sum-product networks, (NIPs (2011)), 666-674
[15] Gripenberg, G., Approximation by neural network with a bounded number of nodes at each level, Journal of Approximation Theory, 122, 260-266 (2003) · Zbl 1019.41018
[16] Guliyev, N.; Ismailov, V., A single hidden layer feedforward network with only one neuron in the hidden layer can approximate any univariate function, Neural Computation, 28, 1289-1304 (2016) · Zbl 1474.68153
[17] Hahm, N.; Hong, B. I., A note on neural network approximation with a sigmoidal function, Applied Mathematical Sciences, 10, 2075-2085 (2016)
[18] Hinton, G. E.; Osindero, S.; Teh, Y. W., A fast learning algorithm for deep belief nets, Neural Computation, 18, 1527-1554 (2006) · Zbl 1106.68094
[19] Iliev, A.; Kyurkchiev, N.; Markov, S., On the approximation of the cut and step functions by logistic and gompertz functions, Biomath, 4, 1510101 (2015) · Zbl 1369.41025
[20] Ismailov, V. E., On the approximation by neural networks with bounded number of neurons in hidden layers, Journal of Mathematical Analysis and Applications, 417, 963-969 (2014) · Zbl 1303.41010
[21] Krizhevsky, A.; Sutskever, I.; Hinton, G. E., Imagenet classification with deep convolutional neural networks, (NIPS (2012)), 1105-2097
[22] Kürková, V.; Sanguineti, M., Can two hidden layers make a difference?, (Tomassini, M.; Antonioni, A.; Daolio, F.; Buesser, P., Adaptive and natural computing algorithms. Adaptive and natural computing algorithms, Lecture notes in computer science, Vol. 7823 (2013), Springer-Verlag: Springer-Verlag New York, Ny, USA), 30-39
[23] LeCun, Y., The unreasonable effectiveness of deep learning, (Seminar (2014), Johns Hopkins University)
[24] Lee, H.; Pham, P.; Largman, Y.; Ng, A. Y., Unsupervised feature learning for audio classification using convolutional deep belief networks, (NIPS (2010)), 469-477
[25] Lin, S. B.; Cao, F. L.; Xu, Z. B., Essential rate for approximation by spherical neural networks, Neural Networks, 24, 752-758 (2011) · Zbl 1267.41002
[26] Lorentz, L.; von Golitschek, M.; Makovoz, Y., Constructive approximation: Advanced problems (1996), Springer: Springer Berlin · Zbl 0910.41001
[27] Maiorov, V., On best approximation by ridge functions, Journal of Approximation Theory, 99, 68-94 (1999) · Zbl 0939.41014
[28] Maiorov, V., On best approximation of classes by radial functions, Journal of Approximation Theory, 120, 36-70 (2003) · Zbl 1013.41013
[29] Maiorov, V., Almost optimal estimates for best approximation by translates on a torus, Constructive Approximation, 21, 337-349 (2005) · Zbl 1076.41013
[30] Maiorov, V., Representation of polynomial by linear combinations of radial basis functions, Constructive Approximation, 37, 283-293 (2013) · Zbl 1264.41023
[31] Maiorov, V.; Meir, R.; Ratsaby, J., On the approximation of functional classes equipped with a uniform measure using ridge functions, Journal of Approximation Theory, 99 (1999), 95-11 · Zbl 0940.41009
[32] Maiorov, V.; Pinkus, A., Lower bounds for approximation by MLP neural networks, Neurocomputing, 25, 81-91 (1999) · Zbl 0931.68093
[33] Mercer, J., Functions of positive and negative type, and their connection with the theory of integral equations, Philosophical Transactions of the Royal Society of London. Series A. Mathematical, Physical and Engineering Science, 209, 415-446 (1909) · JFM 40.0408.02
[34] Mhaskar, H. N., Approximation properties of a multilayered feedforward artificial neural network, Advances in Computational Mathematics, 1, 61-80 (1993) · Zbl 0824.41011
[36] Montúfar, G.; Pascanu, R.; Cho, K.; Bengio, Y., On the number of linear regions of deep nerual networks, (Nips (2013)), 2924-2932
[37] Petrushev, P., Approximation by ridge functions and neural networks, SIAM Journal on Mathematical Analysis, 30, 155-189 (1999) · Zbl 0927.41006
[38] Pinkus, A., Approximation theory of the MLP model in neural networks, Acta Numerica, 8, 143-195 (1999) · Zbl 0959.68109
[41] Rudin, W., Functrional analysis (1973), McGraw-Hill: McGraw-Hill New York · Zbl 0253.46001
[42] Schmidhuber, J., Deep learning in neural networks: an overview, Neural Networks, 61, 85-117 (2015)
[43] Shiryayev, A. N., Probability (1984), Springer-Verlag: Springer-Verlag Berlin · Zbl 0536.60001
[45] Wang, K. Y.; Li, L. Q., Harmonic analysis and approximation on the unit sphere (2000), Science Press: Science Press Beijing
This reference list is based on information provided by the publisher or from digital mathematics libraries. Its items are heuristically matched to zbMATH identifiers and may contain data conversion errors. In some cases that data have been complemented/enhanced by data from zbMATH Open. This attempts to reflect the references listed in the original paper as accurately as possible without claiming completeness or a perfect matching.