Reinforcement learning
Preface
This is a hastily written version of the lecture notes used in the “CS6700: Reinforcement learning” course. The portion on the theory of MDPs roughly coincides with Chapter 1 of (Bertsekas 2017), and Chapters 2, 4, 5 and 6 of (Bertsekas and Tsitsiklis 1996). For several topics, (Sutton and Barto 1998) is an useful reference, in particular, to obtain an intuitive understanding. Also, Chapters 6 and 7 of (Bertsekas 2012) are useful reference material for the advanced topics, such as RL with function approximation.
I would like to thank the students of Jan-May’2021 batch of CS6700 for help in typesetting a portion of these notes. Do note that these notes require a major editorial revision, as well as a round of proofreading, and the reader is to be wary of the errors. As an alternative, the textbooks cited above are excellent source material for learning the foundations of RL.
A special thanks to Prof. Aditya Mahajan for providing the Quarto template.
About the course
Course Content
- Markov Decision Processes (MDPs)
- Finite horizon MDPs
- General theory
- DP algorithm
- Infinite horizon model (1): Stochastic shortest path
- General theory: Contraction mapping, Bellman equation
- Computational solution schemes: Value and policy iteration, convergence analysis
- Infinite horizon model (2): Discounted cost MDPs
- General theory: Contraction mapping, Bellman equation
- Classical solution techniques: Value and policy iteration
- Finite horizon MDPs
- Reinforcement Learning
- Stochastic approximation
- Introduction and connection to RL
- Convergence result for contraction mappings
- Tabular methods
- Monte Carlo policy evaluation
- Temporal difference learning
- TD(0), TD(λ)
- Convergence analysis
- Q-learning and its convergence analysis
- Function approximation
- Approximate policy evaluation using TD(λ)
- Least-squares methods: LSTD and LSPI
- Policy-gradient algorithms
- Policy gradient theorem
- Gradient estimation using likelihood ratios
- Variants (REINFORCE, PPO, etc.)
- Stochastic approximation
Reference books
- D. P. Bertsekas, Dynamic Programming and Optimal Control, Vol. I, Athena Scientific, 2017
- D. P. Bertsekas, Dynamic Programming and Optimal Control, Vol. II, Athena Scientific, 2012
- D. P. Bertsekas and J. N. Tsitsiklis, Neuro-Dynamic Programming, Athena Scientific, 1996
- R. S. Sutton and A. G. Barto, Reinforcement Learning: An Introduction, MIT Press, 2020