Learning to win process-control games watching game-masters.

*(English)*Zbl 1009.68116Summary: The present paper focuses on some interesting classes of process-control games, where winning essentially means successfully controlling the process. A master for one of these games is an agent who plays a winning strategy. In this paper we investigate situations in which even a complete model (given by a program) of a particular game does not provide enough information to synthesize – even incrementally – a winning strategy. However, if in addition to getting a program, a machine may also watch masters play winning strategies, then the machine is able to incrementally learn a winning strategy for the given game. Studied are successful learning from arbitrary masters and from pedagogically useful selected masters. It is shown that selected masters are strictly more helpful for learning than are arbitrary masters. Both for learning from arbitrary masters and for learning from selected masters, though, there are cases where one can learn programs for winning strategies from masters but not if one is required to learn a program for the master’s strategy itself. Both for learning from arbitrary masters and for learning from selected masters, one can learn strictly more by watching \(m+1\) masters than one can learn by watching only \(m\). Last, a simulation result is presented where the presence of a selected master reduces the complexity from infinitely many semantic mind changes to finitely many syntactic ones.

##### MSC:

68T05 | Learning and adaptive systems in artificial intelligence |

91A26 | Rationality and learning in game theory |

##### Software:

C4.5
Full Text:
DOI

**OpenURL**

##### References:

[1] | Bain, M.; Sammut, C., A framework for behavioural cloning, () |

[2] | Bārzdiņš, J.M., Two theorems on the limiting synthesis of functions, Theory of algorithms and programs, Latvian state university, Riga, (1974), p. 82-88 |

[3] | Büchi, J.R.; Landweber, L.H., Solving sequential conditions by finite-state strategies, Trans. amer. math. soc., 138, 295-311, (1969) · Zbl 0182.02302 |

[4] | Case, J.; Kaufmann, S.; Kinber, E.; Kummer, M., Learning recursive functions from approximations, J. comput. system sci., 55, 183-196, (1997) · Zbl 0880.68107 |

[5] | Case, J.; Smith, C., Comparison of identification criteria for machine inductive inference, Theoret. comput. sci., 25, 193-220, (1983) · Zbl 0524.03025 |

[6] | Cenzer, D.; Remmel, J., Recursively presented games and strategies, Math. soc. sci., 24, 117-139, (1992) · Zbl 0786.90103 |

[7] | Freivalds, R.V.; Wiehagen, R., Inductive inference with additional information, Elektron. informationsverarb. kybernet., 15, 179-185, (1979) · Zbl 0437.03018 |

[8] | Jain, S.; Sharma, A., Learning with the knowledge of an upper bound on program size, Inform. and comput., 102, 118-166, (1993) · Zbl 0769.68109 |

[9] | Jockusch, C.G.; Lewis, A.; Remmel, J.B., II_{1}0-classes and Rado’s selection principle, J. symbolic logic, 56, 684-693, (1991) · Zbl 0744.03046 |

[10] | Kummer, M.; Ott, M., Learning branches and learning to win closed games, Proceedings of the ninth annual conference on computational learning theory, (1996), Assoc. Comput. Mach New York, p. 280-291 |

[11] | Kummer, M.; Stephan, F., On the structure of degrees of inferability, J. comput. system sci., 52, 214-238, (1996) · Zbl 1152.68452 |

[12] | Maler, O.; Pnueli, A.; Sifakis, J., On the synthesis of discrete controllers for timed systems, Proceedings of the annual symposium on the theoretical aspects of computer science, lecture notes in computer science, (1995), Springer-Verlag Berlin/New York, p. 229-242 · Zbl 1379.68227 |

[13] | McNaughton, R., Infinite games played on finite graphs, Ann. pure appl. logic, 65, 149-184, (1993) · Zbl 0798.90151 |

[14] | Michie, D.; Sammut, C., Machine learning from real-time input-output behaviour, Proceedings of the international conference on design to manufacture in modern industry, (1993), p. 363-369 |

[15] | Odifreddi, P., Classical recursion theory, (1989), North-Holland Amsterdam · Zbl 0931.03057 |

[16] | Osherson, D.; Stob, M.; Weinstein, S., Systems that learn, (1986), MIT Press Cambridge |

[17] | Ott, M., Learning strategies for infinite games, (1998), Universität KarlsruheInstitut für Logik, Komplexität und Deduktionssysteme Shaker Verlag |

[18] | Ott, M.; Stephan, F., The complexity of learning branches and strategies from queries, Proceedings of the eighth annual international symposium on algorithms and computation, (1997), Springer-Verlag Berlin/New York, p. 283-292 · Zbl 0886.68113 |

[19] | Ott, M.; Stephan, F., Structural measures for games and process control in the branch learning model, (), 94-108 · Zbl 0945.68150 |

[20] | Pitt, L., Probabilistic inductive inference, J. assoc. comput. Mach., 36, 383-433, (1989) · Zbl 0676.68046 |

[21] | Quinlan, J., C4.5: programs for machine learning, (1992), Morgan Kaufmann San Mateo |

[22] | Rivest, R.L.; Schapire, R.E., Inference of finite automata using homing sequences, Inform. and comput., 103, 299-347, (1993) · Zbl 0786.68082 |

[23] | Sammut, C., Acquiring expert knowledge by learning from recorded behaviours, Japanese knowledge acquisition workshop, (1992) |

[24] | Sammut, C., Automatic construction of reactive control systems using symbolic machine learning, Knowledge engrg. rev., 11, 27-42, (1996) |

[25] | Sammut, C.; Hurst, S.; Kedzier, D.; Michie, D., Learning to fly, Proceedings of the ninth international conference on machine learning, (1992), Morgan Kaufmann San Meteo |

[26] | Smith, C., The power of pluralism for automatic program synthesis, J. assoc. comput. Mach., 29, 1144-1165, (1982) · Zbl 0496.68065 |

[27] | Thomas, W.1995, On the synthesis of strategies in infinite games, inProceedings of the Annual Symposium on the Theoretical Aspects of Computer Science, Lecture Notes in Computer Science, Vol. 900, pp. 1-13, Springer-Verlag, Berlin/New York. |

[28] | Urbanc̆ic̆, T.; Bratko, I., Reconstructing human skill with machine learning, () |

This reference list is based on information provided by the publisher or from digital mathematics libraries. Its items are heuristically matched to zbMATH identifiers and may contain data conversion errors. It attempts to reflect the references listed in the original paper as accurately as possible without claiming the completeness or perfect precision of the matching.