×

The QLBS Q-Learner goes NuQLear: fitted Q iteration, inverse RL, and option portfolios. (English) Zbl 1420.91463

Summary: The QLBS model is a discrete-time option hedging and pricing model that is based on Dynamic Programming (DP) and Reinforcement Learning (RL). It combines the famous Q-Learning method for RL with the Black-Scholes (-Merton) (BSM) model’s idea of reducing the problem of option pricing and hedging to the problem of optimal rebalancing of a dynamic replicating portfolio for the option, which is made of a stock and cash. Here we expand on several NuQLear (Numerical Q-Learning) topics with the QLBS model. First, we investigate the performance of Fitted Q Iteration for an RL (data-driven) solution to the model, and benchmark it versus a DP (model-based) solution, as well as versus the BSM model. Second, we develop an Inverse Reinforcement Learning (IRL) setting for the model, where we only observe prices and actions (re-hedges) taken by a trader, but not rewards. Third, we outline how the QLBS model can be used for pricing portfolios of options, rather than a single option in isolation, thus providing its own, data-driven and model-independent solution to the (in)famous volatility smile problem of the Black-Scholes model.

MSC:

91G20 Derivative securities (option pricing, hedging, etc.)
91-08 Computational methods for problems pertaining to game theory, economics, and finance
91G10 Portfolio theory
PDFBibTeX XMLCite
Full Text: DOI arXiv

References:

[1] Black, F. and Scholes, M., The pricing of options and corporate liabilities. J. Political Econ., 1973, 81(3), 637-654. doi: 10.1086/260062 · Zbl 1092.91524
[2] Carr, P., Ellis, K. and Gupta, V., Static hedging of exotic options. J. Finance, 1998, 53(3), 1165-1190. doi: 10.1111/0022-1082.00048
[3] Das, S., Traders, Guns, and Money, 2006 (FT Prentice Hall: Harlow, UK).
[4] Ernst, D., Geurts, P. and Wehenkel, L., Tree-based Batch model reinforcement learning. J. Mach. Learn. Res., 2005, 6, 405-556. · Zbl 1222.68193
[5] Fox, R., Pakman, A. and Tishby, N., Taming the noise in reinforcement learning via soft updates. Available online at: https://arxiv.org/pdf/1512.08562.pdf, 2015.
[6] Halperin, I., Inverse reinforcement learning for marketing. Available online at: https://papers.ssrn.com/sol3/papers.cfm?abstract_id=3087057, 2017.
[7] Halperin, I., QLBS: Q-Learner in the Black-Scholes (-Merton) worlds. Available online at: https://papers.ssrn.com/sol3/papers.cfm?abstract_id=3087076, 2017.
[8] Kober, J., Bagnell, J.A. and Peters, J., Reinforcement learning in robotics: A survey. Int. J. Robot. Res., 2013, 32(11), 1238-1278. doi: 10.1177/0278364913495721
[9] Liu, S., Araujo, M., Brunskill, E., Rosetti, R., Barros, J. and Krishnan, R., Undestanding sequential decisions via inverse reinforcement learning. IEEE 14th International Conference on Mobile Data Management, Milan, Italy, 2013.
[10] Markowitz, H., Portfolio Selection: Efficient Diversification of Investment, 1959 (John Wiley: New York).
[11] Merton, R., Theory of rational option pricing. Bell J. Econ. Manag. Sci., 1974, 4(1), 141-183. doi: 10.2307/3003143 · Zbl 1257.91043
[12] Murphy, S.A., A generalization error for Q-Learning. J. Mach. Learn. Res., 2005, 6, 1073-1097. · Zbl 1222.68271
[13] Scherrer, R.J., Time variation of a fundamental dimensionless constant. Available online at: http://lanl.arxiv.org/pdf/0903.5321.
[14] Sutton, R.S. and Barto, A.G., Reinforcement Learning: An Introduction, 2nd ed., 2018 (MIT Press: Cambridge, MA). · Zbl 1407.68009
[15] van Hasselt, H., Double Q-Learning, advances in neural information processing systems. Available online at: http://papers.nips.cc/paper/3964-double-q-learning.pdf, 2010.
[16] Watkins, C.J., Learning from delayed rewards. Ph.D. Thesis, Kings College, Cambridge, May, 1989.
[17] Watkins, C.J. and Dayan, P., Q-Learning. Mach. Learn., 1992, 8(3-4), 179-192. doi: 10.1007/BF00992698
[18] Wilmott, P., Paul Wilmott on Quantitative Finance, 2000 (John Wiley & Sons Ltd: Chichester). · Zbl 1127.91002
[19] Ziebart, B.D., Maas, A., Bagnell, J.A. and Dey, A.K., Maximum entropy inverse reinforcement learning. AAAI, 2008, 8, 1433-1438.
This reference list is based on information provided by the publisher or from digital mathematics libraries. Its items are heuristically matched to zbMATH identifiers and may contain data conversion errors. In some cases that data have been complemented/enhanced by data from zbMATH Open. This attempts to reflect the references listed in the original paper as accurately as possible without claiming completeness or a perfect matching.