zbMATH — the first resource for mathematics

Incentivizing exploration with heterogeneous value of money. (English) Zbl 1406.91131
Markakis, Evangelos (ed.) et al., Web and internet economics. 11th international conference, WINE 2015, Amsterdam, The Netherlands, December 9–12, 2015. Proceedings. Berlin: Springer (ISBN 978-3-662-48994-9/pbk; 978-3-662-48995-6/ebook). Lecture Notes in Computer Science 9470, 370-383 (2015).
Summary: Recently, P. Frazier et al. [“Incentivizing exploration”, in: Proceedings of the 15th ACM conference on economics and computation, EC’14. New York, NY: Association for Computing Machinery (ACM). 5–22 (2014; doi:10.1145/2600057.2602897)] proposed a natural model for crowd-sourced exploration of different a priori unknown options: a principal is interested in the long-term welfare of a population of agents who arrive one by one in a multi-armed bandit setting. However, each agent is myopic, so in order to incentivize him to explore options with better long-term prospects, the principal must offer the agent money. Frazier et al. [loc. cit.] showed that a simple class of policies called time-expanded are optimal in the worst case, and characterized their budget-reward tradeoff. The previous work assumed that all agents are equally and uniformly susceptible to financial incentives. In reality, agents may have different utility for money. We therefore extend the model of Frazier et al. [loc. cit.] to allow agents that have heterogeneous and nonlinear utilities for money. The principal is informed of the agent’s tradeoff via a signal that could be more or less informative.
Our main result is to show that a convex program can be used to derive a signal-dependent time-expanded policy which achieves the best possible Lagrangian reward in the worst case. The worst-case guarantee is matched by so-called “diamonds in the rough” instances; the proof that the guarantees match is based on showing that two different convex programs have the same optimal solution for these specific instances.
For the entire collection see [Zbl 1326.68026].
91B16 Utility theory
91A40 Other game-theoretic models
90B40 Search theory
Full Text: DOI
[1] Auer, P., Cesa-Bianchi, N., Freund, Y., Schapire, R.E.: Gambling in a rigged casino: the adversarial multi-armed banditproblem. In: Proceedings of the 36th IEEE Symposium on Foundations of Computer Science, pp. 322–331 (1995) · Zbl 0938.68920 · doi:10.1109/SFCS.1995.492488
[2] Auer, P., Cesa-Bianchi, N., Freund, Y., Schapire, R.E.: The nonstochastic multiarmed bandit problem. SIAM J. Comput. 32(1), 48–77 (2003) · Zbl 1029.68087 · doi:10.1137/S0097539701398375
[3] Boyd, S., Vandenberghe, L.: Convex Optimization. Cambridge University Press, Cambridge (2004) · Zbl 1058.90049 · doi:10.1017/CBO9780511804441
[4] Frazier, P., Kempe, D., Kleinberg, J., Kleinberg, R.: Incentivizing exploration. In: Proceedings of the 16th ACM Conference on Economics and Computation, pp. 5–22 (2014) · doi:10.1145/2600057.2602897
[5] Gittins, J.C.: Multi-Armed Bandit Allocation Indices. Wiley, New York (1989) · Zbl 0699.90068
[6] Gittins, J.C., Glazebrook, K.D., Weber, R.: Multi-Armed Bandit Allocation Indices, 2nd edn. Wiley, New York (2011) · Zbl 1401.90257 · doi:10.1002/9780470980033
[7] Gittins, J.C., Jones, D.M.: A dynamic allocation index for the sequential design of experiments. In: Gani, J. (ed.) Progress in Statistics, pp. 241–266 (1974) · Zbl 0303.62064
[8] Ho, C.J., Slivkins, A., Vaughan, J.W.: Adaptive contract design for crowdsourcing markets: bandit algorithms for repeated principal-agent problems. In: Proceedings of the 16th ACM Conf. on Economics and Computation, pp. 359–376 (2014) · Zbl 1351.68293 · doi:10.1145/2600057.2602880
[9] Katehakis, M.N., Veinott Jr., A.F.: The multi-armed bandit problem: decomposition and computation. Math. Oper. Res. 12(2), 262–268 (1987) · Zbl 0618.90097 · doi:10.1287/moor.12.2.262
[10] Kremer, I., Mansour, Y., Perry, M.: Implementing the ”wisdom of the crowd”. In: Proceedings of the 15th ACM Conf. on Electronic Commerce, pp. 605–606 (2013) · doi:10.1145/2492002.2482542
[11] Lai, T.L., Robbins, H.E.: Asymptotically efficient adaptive allocation rules. Adv. Appl. Math. 6(1), 4–22 (1985) · Zbl 0568.62074 · doi:10.1016/0196-8858(85)90002-8
[12] Mansour, Y., Slivkins, A., Syrgkanis, V.: Bayesian incentive-compatible bandit exploration. In: Proceedings of the 17th ACM Conference on Economics and Computation, pp. 565–582 (2015) · doi:10.1145/2764468.2764508
[13] Robbins, H.E.: Some aspects of the sequential design of experiments. Bull. Am. Math. Soc. 58, 527–535 (1952) · Zbl 0049.37009 · doi:10.1090/S0002-9904-1952-09620-8
[14] Singla, A., Krause, A.: Truthful incentives in crowdsourcing tasks using regret minimization mechanisms. In: 22nd International World Wide Web Conference, pp. 1167–1178 (2013) · doi:10.1145/2488388.2488490
[15] Slivkins, A., Wortman Vaughan, J.: Online decision making in crowdsourcing markets: theoretical challenges (position paper). ACM SIGecam Exch. 12(2), 4–23 (2013) · doi:10.1145/2692359.2692364
[16] Spence, M.: Job market signaling. Q. J. Econ. 87, 355–374 (1973) · doi:10.2307/1882010
[17] Whittle, P.: Multi-armed bandits and the Gittins index. J. Roy. Stat. Soc. Ser. B (Methodol.) 42(2), 143–149 (1980) · Zbl 0439.90096
This reference list is based on information provided by the publisher or from digital mathematics libraries. Its items are heuristically matched to zbMATH identifiers and may contain data conversion errors. It attempts to reflect the references listed in the original paper as accurately as possible without claiming the completeness or perfect precision of the matching.