# Lectures on Reinforcement Learning Theory

*2023-05-15*

# Preface

This is a hastily written version of the lecture notes used in the “CS6700: Reinforcement learning” course. The portion on the theory of MDPs roughly coincides with Chapter 1 of (D. P. Bertsekas 2017), and Chapters 2, 4, 5 and 6 of (D. Bertsekas and Tsitsiklis 1996). For several topics, (Sutton and Barto 1998) is an useful reference, in particular, to obtain an intuitive understanding. Also, Chapters 6 and 7 of (D. P. Bertsekas 2012) are useful reference material for the advanced topics, such as RL with function approximation.

I would like to thank the students of Jan-May’2021 batch of CS6700 for help in typesetting a portion of these notes. Do note that these notes require a major editorial revision, as well as a round of proofreading, and the reader is to be wary of the errors. As an alternative, the textbooks cited above are excellent source material for learning the foundations of RL.

### References

*Dynamic Programming and Optimal Control, Vol. II, 4th Edition*. Athena Scientific.

*Dynamic Programming and Optimal Control, Vol. I*. Athena Scientific.

*Neuro-Dynamic Programming*. Athena Scientific.

*Reinforcement Learning: An Introduction*. MIT Press.