- Q-learning
Q-learning is a
reinforcement learning technique that works by learning an action-value function that gives the expected utility of taking a given action in a given state and following a fixed policy thereafter. A strength with Q-learning is that it is able to compare the expected utility of the available actions without requiring a model of the environment. A recent variation called delayed-Q learning has shown substantial improvements, bringing PAC bounds to Markov Decision Processes.Algorithm
The core of the algorithm is a simple value iteration update.For each state, "s", from the state set "S", and for each action, "a", from the action set "A", we can calculate an update to its expected discounted reward with the following expression:
:
where "" is an observed real reward at time , are the learning rates such that 0 ≤≤ 1, and is the discount factor such that 0 ≤ < 1.
Implementation
Q-Learning at its simplest uses tables to store data. This very quickly loses viability with increasing levels of complexity of the system it is monitoring/controlling. One answer to this problem is to use an (adapted)
Artificial Neural Network as a function approximator, as demonstrated by Tesauro in hisBackgammon playingTemporal Difference Learning research. An adaptation of the standard neural network is required because the required result (from which the error signal is generated) is itself generated at run-time.See also
*
Reinforcement learning
*Temporal difference learning
*SARSA
* Iterated prisoner's dilemma
*Game theory
*Fitted Q iteration algorithm External links
* [http://knol.google.com/k/christian-eder/q-learning/xfqw1gyel5ga/3# Q-Learning on Google Knols]
* [http://www.cs.rhul.ac.uk/~chrisw/thesis.html Watkins, C.J.C.H. (1989). Learning from Delayed Rewards. PhD thesis, Cambridge University, Cambridge, England.]
* [http://portal.acm.org/citation.cfm?id=1143955 Strehl, Li, Wiewiora, Langford, Littman (2006). PAC model-free reinforcement learning]
* [http://people.revoledu.com/kardi/tutorial/ReinforcementLearning/index.html Q-Learning by examples]
* [http://www.cs.ualberta.ca/%7Esutton/book/the-book.html Reinforcement Learning online book]
* [http://elsy.gdan.pl/index.php Connectionist Q-learning Java Framework]
* [http://sourceforge.net/projects/piqle/ Piqle : a Generic Java Platform for Reinforcement Learning]
* [http://ccl.northwestern.edu/netlogo/models/community/Reinforcement%20Learning%20Maze Online demonstration of Q-learning (bug in a maze)]
* [http://www.research.ibm.com/infoecon/paps/html/ijcai99_qnn/node4.html Q-learning work by Tesauro]
* [http://citeseer.comp.nus.edu.sg/352693.html Q-learning work by Tesauro Citeseer Link]
Wikimedia Foundation. 2010.