×

Neural circuits for learning context-dependent associations of stimuli. (English) Zbl 1434.68533

Summary: The use of reinforcement learning combined with neural networks provides a powerful framework for solving certain tasks in engineering and cognitive science. Previous research shows that neural networks have the power to automatically extract features and learn hierarchical decision rules. In this work, we investigate reinforcement learning methods for performing a context-dependent association task using two kinds of neural network models (using continuous firing rate neurons), as well as a neural circuit gating model. The task allows examination of the ability of different models to extract hierarchical decision rules and generalize beyond the examples presented to the models in the training phase. We find that the simple neural circuit gating model, trained using response-based regulation of Hebbian associations, performs almost at the same level as a reinforcement learning algorithm combined with neural networks trained with more sophisticated back-propagation of error methods. A potential explanation is that hierarchical reasoning is the key to performance and the specific learning method is less important.

MSC:

68T07 Artificial neural networks and deep learning
68T05 Learning and adaptive systems in artificial intelligence
92B20 Neural networks for/in biological studies, artificial life and related topics

Software:

LSTM
PDFBibTeX XMLCite
Full Text: DOI

References:

[1] Badre, D.; Frank, M. J., Mechanisms of hierarchical reinforcement learning in cortico-striatal circuits 2: Evidence from fMRI, Cerebral Cortex, 22, 3, 527-536 (2012)
[2] Badre, D.; Kayser, A. S.; D’Esposito, M., Frontal cortex and the discovery of abstract action rules, Neuron, 66, 2, 315-326 (2010)
[3] Bertsekas, D. P., Dynamic programming and optimal control. Vol. I and II (1995), Athena Scientific: Athena Scientific Belmont, MA · Zbl 0904.90170
[4] Bertsekas, D.; Tsitsiklis, J., Neuro-dynamic programming (1996), Athena Scientific: Athena Scientific Belmont, MA · Zbl 0924.68163
[5] Chatham, C. H.; Herd, S. A.; Brant, A. M.; Hazy, T. E.; Miyake, A.; O’Reilly, R., From an executive network to executive control: a computational model of the \(n\)-back task, Journal of Cognitive Neuroscience, 23, 11, 3598-3619 (2011)
[6] Dayan, P.; Abbott, L. F., Theoretical neuroscience. Vol. 10 (2001), MIT Press: MIT Press Cambridge, MA · Zbl 1051.92010
[7] Dayan, P.; Watkins, C., Q-learning, Machine Learning, 8, 3, 279-292 (1992) · Zbl 0773.68062
[8] Estanjini, R. M.; Li, K.; Paschalidis, I. C., A least squares temporal difference actor-critic algorithm with applications to warehouse management, Naval Research Logistics (NRL), 59, 3-4, 197-211 (2012), URL http://dx.doi.org/101002/nav.21481 · Zbl 1407.90334
[9] Gers, F. A.; Schmidhuber, J.; Cummins, F., Learning to forget: Continual prediction with LSTM, Neural Computation, 12, 10, 2451-2471 (2000)
[10] Goodfellow, I.; Bengio, Y.; Courville, A., Deep learning (2016), MIT Press, URL http://www.deeplearningbook.org · Zbl 1373.68009
[11] Graves, A.; Schmidhuber, J., Framewise phoneme classification with bidirectional LSTM and other neural network architectures, Neural Networks, 18, 5, 602-610 (2005)
[12] Grondman, I.; Busoniu, L.; Lopes, G. A.; Babuska, R., A survey of actor-critic reinforcement learning: Standard and natural policy gradients, IEEE Transactions on Systems, Man, and Cybernetics, Part C (Applications and Reviews), 42, 6, 1291-1307 (2012)
[13] Hasselmo, M. E., A model of prefrontal cortical mechanisms for goal-directed behavior, Journal of Cognitive Neuroscience, 17, 7, 1115-1129 (2005)
[14] Hasselmo, M. E.; Eichenbaum, H., Hippocampal mechanisms for the context-dependent retrieval of episodes, Neural Networks, 18, 9, 1172-1190 (2005) · Zbl 1085.92005
[15] Hasselmo, M. E.; Stern, C. E., A network model of behavioural performance in a rule learning task, Philosophical Transactions of the Royal Society B: Biological Sciences, 373, Article 20170275 pp. (2018)
[16] Hausknecht, M., & Stone, P. (2015). Deep reinforcement learning in parameterized action space. arXiv preprint arXiv:1511.04143; Hausknecht, M., & Stone, P. (2015). Deep reinforcement learning in parameterized action space. arXiv preprint arXiv:1511.04143
[17] Hochreiter, S.; Schmidhuber, J., Long short-term memory, Neural Computation, 9, 8, 1735-1780 (1997)
[18] Katz, Y.; Kath, W. L.; Spruston, N.; Hasselmo, M. E., Coincidence detection of place and temporal context in a network model of spiking hippocampal neurons, PLoS Computational Biology, 3, 12, e234 (2007)
[19] Kingma, D., & Ba, J. (2014). Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980; Kingma, D., & Ba, J. (2014). Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980
[20] Koene, R. A.; Hasselmo, M. E., An integrate-and-fire model of prefrontal cortex neuronal activity during performance of goal-directed decision making, Cerebral Cortex, 15, 12, 1964-1981 (2005)
[21] Konda, V. R.; Tsitsiklis, J. N., On actor-critic algorithms, SIAM Journal on Control and Optimization, 42, 4, 1143-1166 (2003) · Zbl 1049.93095
[22] Kriete, T.; Noelle, D. C.; Cohen, J. D.; O’Reilly, R. C., Indirection and symbol-like processing in the prefrontal cortex and basal ganglia, Proceedings of the National Academy of Sciences, 110, 41, 16390-16395 (2013)
[23] LeCun, Y.; Bengio, Y.; Hinton, G., Deep learning, Nature, 521, 7553, 436-444 (2015)
[24] Levine, S.; Finn, C.; Darrell, T.; Abbeel, P., End-to-end training of deep visuomotor policies, Journal of Machine Learning Research (JMLR), 17, 1, 1334-1373 (2016) · Zbl 1360.68687
[25] Liu, H.; Wu, Y.; Sun, F., Extreme trust region policy optimization for active object recognition, IEEE Transactions on Neural Networks and Learning Systems, 29, 6, 2253-2258 (2018)
[26] Miller, E. K.; Cohen, J. D., An integrative theory of prefrontal cortex function, Annual Review of Neuroscience, 24, 1, 167-202 (2001)
[27] Mnih, V., Badia, A. P., Mirza, M., Graves, A., & Lillicrap, T. P., et al. (2016). Asynchronous Methods for Deep Reinforcement Learning. arXiv 48, 1-28.URL http://arxiv.org/abs/1602.01783; Mnih, V., Badia, A. P., Mirza, M., Graves, A., & Lillicrap, T. P., et al. (2016). Asynchronous Methods for Deep Reinforcement Learning. arXiv 48, 1-28.URL http://arxiv.org/abs/1602.01783
[28] Mnih, V.; Kavukcuoglu, K.; Silver, D.; Rusu, A.a.; Veness, J.; Bellemare, M. G., Human-level control through deep reinforcement learning, Nature, 518, 7540, 529-533 (2015), URL http://dx.doi.org/101038/nature14236
[29] Nair, V., & Hinton, G. E. (2010). Rectified linear units improve restricted boltzmann machines. In Proceedings of the 27th International conference on machine learning (ICML-10); Nair, V., & Hinton, G. E. (2010). Rectified linear units improve restricted boltzmann machines. In Proceedings of the 27th International conference on machine learning (ICML-10)
[30] O’Reilly, R. C.; Frank, M. J., Making working memory work: a computational model of learning in the prefrontal cortex and basal ganglia, Neural Computation, 18, 2, 283-328 (2006) · Zbl 1090.92008
[31] O’Reilly, R. C.; Frank, M. J.; Hazy, T. E.; Watz, B., PVLV: the primary value and learned value Pavlovian learning algorithm, Behavioral Neuroscience, 121, 1, 31 (2007)
[32] Pennesi, P.; Paschalidis, I. C., A distributed actor-critic algorithm and applications to mobile sensor network coordination problems, IEEE Transactions on Automatic Control, 55, 2, 492-497 (2010) · Zbl 1368.90026
[33] Peters, J.; Schaal, S., Reinforcement learning of motor skills with policy gradients, Neural Networks, 21, 4, 682-697 (2008)
[34] Poirazi, P.; Brannon, T.; Mel, B. W., Arithmetic of subthreshold synaptic summation in a model CA1 pyramidal cell, Neuron, 37, 6, 977-987 (2003)
[35] Raudies, F.; Zilli, E. A.; Hasselmo, M. E., Deep belief networks learn context dependent behavior, PLoS One, 9, 3 (2014)
[36] Rumelhart, D. E.; Hinton, G. E.; Williams, R. J., Learning representations by back-propagating errors, Nature, 323, 6088, 533-536 (1986), URL http://dx.doi.org/101038/323533a0 · Zbl 1369.68284
[37] Rumelhart, D. E.; McClelland, . J.L., Parallel distributed processing (1986), MIT Press: MIT Press Cambridge, MA
[38] Schulman, J., Levine, S., Abbeel, P., Jordan, M., & Moritz, P. (2015). Trust region policy optimization. In International conference on machine learning; Schulman, J., Levine, S., Abbeel, P., Jordan, M., & Moritz, P. (2015). Trust region policy optimization. In International conference on machine learning
[39] Sutton, R.; Barto, A., Reinforcement learning (1998), MIT Press: MIT Press Cambridge, MA
[40] Tesauro, G., TD-Gammon, a self-teaching backgammon program, achieves master-level play, Neural Computation, 6, 2, 215-219 (1994)
[41] Tsitsiklis, J. N., Asynchronous stochastic approximation and \(q\)-learning, Machine Learning, 16, 3, 185-202 (1994) · Zbl 0820.68105
[42] Tsitsiklis, J. N.; Van Roy, B., An analysis of temporal-difference learning with function approximation, IEEE Transactions on Automatic Control, 42, 5, 674-690 (1997) · Zbl 0914.93075
[43] Wallis, J. D.; Anderson, K. C.; Miller, E. K., Single neurons in prefrontal cortex encode abstract rules, Nature, 411, 6840, 953-956 (2001)
[44] Wang, J.; Ding, X.; Lahijanian, M.; Paschalidis, I. C.; Belta, C. A., Temporal logic motion control using actor-critic methods, International Journal of Robotics Research, 34, 10, 1329-1344 (2015)
[45] Wang, J.; Paschalidis, I. C., An actor-critic algorithm with second-order actor and critic, IEEE Transactions on Automatic Control, 62, 6, 2689-2703 (2017) · Zbl 1369.90192
[46] Wang, J.; Paschalidis, I. C., An actor-critic algorithm with second-order actor and critic, IEEE Transactions on Automatic Control, 62, 6, 2689-2703 (2017) · Zbl 1369.90192
[47] Watkins, C. J.; Dayan, P., Q-learning, Machine Learning, 8, 3-4, 279-292 (1992) · Zbl 0773.68062
[48] Watter, M.; Springenberg, J.; Boedecker, J.; Riedmiller, M., Embed to control: A locally linear latent dynamics model for control from raw images, (Cortes, C.; Lawrence, N. D.; Lee, D. D.; Sugiyama, M.; Garnett, R., Advances in neural information processing systems. Vol. 28 (2015), Curran Associates, Inc.), 2746-2754
[49] Xu, K., Ba, J., Kiros, R., Cho, K., Courville, A., & Salakhutdinov, R., et al. (2015). Show, attend and tell: Neural image caption generation with visual attention. arXiv preprint arXiv:1502.03044; Xu, K., Ba, J., Kiros, R., Cho, K., Courville, A., & Salakhutdinov, R., et al. (2015). Show, attend and tell: Neural image caption generation with visual attention. arXiv preprint arXiv:1502.03044
[50] Xu, X.; Zuo, L.; Huang, Z., Reinforcement learning algorithms with function approximation: Recent advances and applications, Information Sciences, 261, 1-31 (2014) · Zbl 1328.68176
[51] Zilli, E. A.; Hasselmo, M. E., Analyses of markov decision process structure regarding the possible strategic use of interacting memory systems, Frontiers in Computational Neuroscience, 2, 6 (2008)
[52] Zilli, E. A.; Hasselmo, M. E., The influence of markov decision process structure on the possible strategic use of working memory and episodic memory, PLoS One, 3, 7, e2756 (2008)
[53] Zilli, E. A.; Hasselmo, M. E., Modeling the role of working memory and episodic memory in behavioral tasks, Hippocampus, 18, 2, 193-209 (2008)
This reference list is based on information provided by the publisher or from digital mathematics libraries. Its items are heuristically matched to zbMATH identifiers and may contain data conversion errors. In some cases that data have been complemented/enhanced by data from zbMATH Open. This attempts to reflect the references listed in the original paper as accurately as possible without claiming completeness or a perfect matching.