zbMATH — the first resource for mathematics

Two-armed bandit problem for parallel data processing systems. (English. Russian original) Zbl 1322.62062
Probl. Inf. Transm. 48, No. 1, 72-84 (2012); translation from Probl. Peredachi Inf. 48, No. 1, 83-95 (2012).
Summary: We consider application of the two-armed bandit problem to processing a large number \(N\) of data where two alternative processing methods can be used. We propose a strategy which at the first stages, whose number is at most \(r-1\), compares the methods, and at the final stage applies only the best one obtained from the comparison. We find asymptotically optimal parameters of the strategy and observe that the minimax risk is of the order of \(N^\alpha\), where \(\alpha = 2^{r-1}/(2^r-1)\). Under parallel processing, the total operation time is determined by the number \(r\) of stages but not by the number \(N\) of data.
62C05 General considerations in statistical decision theory
91A60 Probabilistic games; gambling
Full Text: DOI
[1] Tsetlin, M.L., Issledovaniya po teorii avtomatov i modelirovaniyu biologicheskikh sistem, Moscow: Nauka, 1969. Translated under the title Automaton Theory and Modeling of Biological Systems, New York: Academic, 1973.
[2] Varshavskii, V.I., Kollektivnoe povedenie avtomatov (Collective Behavior of Automata), Moscow: Nauka, 1973. Translated under the title Kollektives Verhalten von Automaten, Warschawski, W.I., Berlin: Akademie, 1978.
[3] Hellman, M.E. and Cover, T.M., Comment on Automata in Random Media, Probl. Peredachi Inf., 1970, vol. 6, no. 2, pp. 21–30 [Probl. Inf. Trans. (Engl. Transl.), 1970, vol. 6, no. 2, pp. 107–114]. · Zbl 0284.94019
[4] Zigangirov, K.Sh., Multiple Hypothesis Discrimination Using Finite-State Automata, Probl. Peredachi Inf., 1977, vol. 13, no. 3, pp. 45–55 [Probl. Inf. Trans. (Engl. Transl.), 1977, vol. 13, no. 3, pp. 194–202].
[5] Sragovich, V.G., Adaptivnoe upravlenie (Adaptive Control), Moscow: Nauka, 1981. · Zbl 0532.93001
[6] Nazin, A.V. and Poznyak, A.S., Adaptivnyi vybor variantov: rekurrentnye algoritmy (Adaptive Choice: Recursive Algorithms), Moscow: Nauka, 1986.
[7] Berry, D.A. and Fristedt, B., Bandit Problems: Sequential Allocation of Experiments, London: Chapman & Hall, 1985. · Zbl 0659.62086
[8] Presman, E.L. and Sonin, I.M., Posledovatel’noe upravlenie po nepolnym dannym. Baiesovskii podkhod (Sequential Control Based on Incomplete Data: Bayesian Approach), Moscow: Nauka, 1982. · Zbl 0524.93002
[9] Kolnogorov, A.V., On Optimal Prior Learning Time in the Two-Armed Bandit Problem, Probl. Peredachi Inf., 2000, vol. 36, no. 4, pp. 117–127 [Probl. Inf. Trans. (Engl. Transl.), 2000, vol. 36, no. 4, pp. 387–396].
[10] Kolnogorov, A.V. and Melnikova, S.V., Minimax R-Stage Strategy for the Multi-Armed Bandit Problem, in Proc. 9th IFAC Workshop on Adaptation and Learning in Control and Signal Processing (ALCOSP’07), St. Petersburg, Russia, 2007. Available at http://www.ifac-papersonline.net/Detailed/30255.html .
[11] Witmer, J.A., Bayesian Multistage Decision Problems, Ann. Statist., 1986, vol. 14, no. 1, pp. 283–297. · Zbl 0599.62014 · doi:10.1214/aos/1176349856
[12] Cheng, Y., Multistage Decision Problems, Sequential Analysis, 1994, vol. 13, no. 4, pp. 329–349. · Zbl 0805.62074 · doi:10.1080/07474949408836313
[13] Vogel, W., An Asymptotic Minimax Theorem for the Two-Armed Bandit Problem, Ann. Math. Stat., 1960, vol. 31, no. 2, pp. 444–451. · Zbl 0093.15701 · doi:10.1214/aoms/1177705907
[14] Lai, T.L. and Robbins, H., Asymptotically Efficient Adaptive Allocation Rules, Adv. Appl. Math., 1985, vol. 6, no. 1, pp. 4–22. · Zbl 0568.62074 · doi:10.1016/0196-8858(85)90002-8
[15] Prokhorov, Yu.V. and Rozanov, Yu.A., Teoriya veroyatnostei: osnovnye poniatiya, predel’nye teoremy, sluchainye protsessy, Moscow: Nauka, 1987, 3rd ed. First edition translated under the title Probability Theory: Basic Concepts, Limit Theorems, Random Processes, Berlin: Springer, 1969.
[16] Ibragimov, I.A. and Linnik, Yu.V., Nezavisimye i statsionarno svyazannye velichiny, Moscow: Nauka, 1965. Translated under the title Independent and Stationary Sequences of Random Variables, Groningen: Wolters-Noordhoff, 1971.
[17] Petrov, V.V., Generalization of CramĂ©r’s Limit Theorem, Uspehi Matem. Nauk (N.S.), 1954, vol. 9, no. 4, pp. 195–202.
This reference list is based on information provided by the publisher or from digital mathematics libraries. Its items are heuristically matched to zbMATH identifiers and may contain data conversion errors. It attempts to reflect the references listed in the original paper as accurately as possible without claiming the completeness or perfect precision of the matching.