Reinforcement learning refers to the process by which an animal (or
agent) understands the reward values of stimuli in the world (through Pavlovian conditioning) and of their own emitted actions (through instrumental conditioning). It turns out that these universal principles of adaptive learning solve in an exact way the optimization problem of sequential decision-making under Markovian environments. This talk will review how dynamic programming can be implemented as incremental reinforcement learning to solve the system of Bellman optimality equations. A generalization of the optimality equations, based on a state-dependent ?temperature? parameter, will also be discussed.
|