Q-learning

Q-learning is a reinforcement learning technique that works by learning an action-value function that gives the expected utility of taking a given action in a given state and following a fixed policy thereafter. A strength with Q-learning is that it is able to compare the expected utility of the available actions without requiring a model of the environment. A recent variation called delayed-Q learning has shown substantial improvements, bringing PAC bounds to Markov Decision Processes.

Algorithm

The core of the algorithm is a simple value iteration update.For each state, "s", from the state set "S", and for each action, "a", from the action set "A", we can calculate an update to its expected discounted reward with the following expression:

: $Q(s_t,a_t) leftarrow Q(s_t,a_t) + alpha_t(s_t,a_t) [r_{t+1} + gamma max_{a}Q(s_{t+1}, a) - Q(s_t,a_t)]$

where " $r_t$ " is an observed real reward at time $t$ , $alpha_t(s, a)$ are the learning rates such that 0 ≤ $alpha_t(s, a)$ ≤ 1, and $gamma$ is the discount factor such that 0 ≤ $gamma$ < 1.

Implementation

Q-Learning at its simplest uses tables to store data. This very quickly loses viability with increasing levels of complexity of the system it is monitoring/controlling. One answer to this problem is to use an (adapted) Artificial Neural Network as a function approximator, as demonstrated by Tesauro in his Backgammon playing Temporal Difference Learning research. An adaptation of the standard neural network is required because the required result (from which the error signal is generated) is itself generated at run-time.

See also

* Reinforcement learning
* Temporal difference learning
* SARSA
* Iterated prisoner's dilemma
* Game theory
* Fitted Q iteration algorithm

External links

* [http://knol.google.com/k/christian-eder/q-learning/xfqw1gyel5ga/3# Q-Learning on Google Knols]
* [http://www.cs.rhul.ac.uk/~chrisw/thesis.html Watkins, C.J.C.H. (1989). Learning from Delayed Rewards. PhD thesis, Cambridge University, Cambridge, England.]
* [http://portal.acm.org/citation.cfm?id=1143955 Strehl, Li, Wiewiora, Langford, Littman (2006). PAC model-free reinforcement learning]
* [http://people.revoledu.com/kardi/tutorial/ReinforcementLearning/index.html Q-Learning by examples]
* [http://www.cs.ualberta.ca/%7Esutton/book/the-book.html Reinforcement Learning online book]
* [http://elsy.gdan.pl/index.php Connectionist Q-learning Java Framework]
* [http://sourceforge.net/projects/piqle/ Piqle : a Generic Java Platform for Reinforcement Learning]
* [http://ccl.northwestern.edu/netlogo/models/community/Reinforcement%20Learning%20Maze Online demonstration of Q-learning (bug in a maze)]
* [http://www.research.ibm.com/infoecon/paps/html/ijcai99_qnn/node4.html Q-learning work by Tesauro]
* [http://citeseer.comp.nus.edu.sg/352693.html Q-learning work by Tesauro Citeseer Link]

Wikimedia Foundation. 2010.

Игры ⚽ Поможем написать реферат

Look at other dictionaries:

Learning Management — is the capacity to design pedagogic strategies that achieve learning outcomes in students. Definition The term Learning Management refers to the capacity to design pedagogic strategies that achieve learning outcomes in students. The emphasis is… … Wikipedia
Learning object metadata — is a data model, usually encoded in XML, used to describe a learning object and similar digital resources used to support learning. The purpose of learning object metadata is to support the reusability of learning objects, to aid discoverability … Wikipedia
Learning log — Learning logs are a personalized learning resource for children. In the learning logs, the children record their responses to learning challenges set by their teachers. Each log is a unique record of the child s thinking and learning. The logs… … Wikipedia
learning — UK US /ˈlɜːnɪŋ/ noun [U] ► the process of getting an understanding of something by studying it or by experience: »Knowledge and learning are essential factors for achieving successful outcomes. »Continuous learning and acquiring new skills are… … Financial and business terms
Learning English Lesson One — Studioalbum von Die Toten Hosen Veröffentlichung 11. November 1991 Label TOT, Virgin … Deutsch Wikipedia
Learning theory — may refer to: * Learning theory (education), the process of how humans learn ** Behaviorism ** cognitivism ** Constructivism (learning theory) ** Connectivism (learning theory) * computational learning theory, a mathematical theory to analyze… … Wikipedia
Learning and Teaching Scotland — infobox Organization name = Learning and Teaching Scotland abbreviation = LTS purpose = Educational headquarters = Glasgow/Dundee region served = Scotland language = English leader title = Chairman leader name = John Mulgrew num staff = 250… … Wikipedia
LEArning EDvantage — is the official website of most Singapore Schools.In line with syllabus changes affecting Primary Mathematics, Secondary Mathematics, Sciences and Humanities, LEAD is launching a new set of interactive courseware.In December 2006, Learning… … Wikipedia
Learning Management System — Pour les articles homonymes, voir LMS. Un LMS (Learning Management System) ou MLE (Managed Learning Environment) ou VLE (Virtual Learning Environment) ou CMS (Course Management System) ou LSS (Learning Support System) est un système logiciel… … Wikipédia en Français
Learning to Breathe — Студийный альбом Switchfoot Дата выпуска … Википедия
Learning to fly — or learn to fly may refer to: * Fledging, a bird, bat or other flighted creature learning how to fly * Flight training, where a person takes lessons to fly an aircraft such as a helicopter or fixed wing aircraft * Learn to Fly , a 1999 song from… … Wikipedia

Academic Dictionaries and Encyclopedias

Q-learning

Look at other dictionaries:

Share the article and excerpts

Academic Dictionaries and Encyclopedias

Wikipedia

Q-learning

Look at other dictionaries:

Share the article and excerpts

Direct link