We first give a brief overview of reinforcement learning and describe the Q-learning algorithm. We then present two recently developed algorithms -- one for regular Markov decision processes (MDPs) and another for constrained MDPs. Finally, we will discuss an application of road traffic control where the problem is to find an optimal order to switch traffic lights at junctions given some information on the state. The results of some preliminary experiments will be presented.
BioShalabh Bhatnagar is currently an Associate Professor in the Computer Science and Automation Department at the Indian Institute of Science, Bangalore. His research interests are in stochastic control and reinforcement learning, stochastic approximation algorithms, communication and wireless networks and more recently in traffic signal control.