This is a hastily written version of the lecture notes used in the “CS6700: Reinforcement learning” course. The portion on the theory of MDPs roughly coincides with Chapter 1 of (D. P. Bertsekas 2017), and Chapters 2, 4, 5 and 6 of (D. Bertsekas and Tsitsiklis 1996). For several topics, (Sutton and Barto 1998) is an useful reference, in particular, to obtain an intuitive understanding. Also, Chapters 6 and 7 of (D. P. Bertsekas 2012) are useful reference material for the advanced topics, such as RL with function approximation.

I would like to thank the students of Jan-May’2021 batch of CS6700 for help in typesetting a portion of these notes. Do note that these notes require a major editorial revision, as well as a round of proofreading, and the reader is to be wary of the errors. As an alternative, the textbooks cited above are excellent source material for learning the foundations of RL.


Bertsekas, Dimitri P. 2012. Dynamic Programming and Optimal Control, Vol. II, 4th Edition. Athena Scientific.
———. 2017. Dynamic Programming and Optimal Control, Vol. I. Athena Scientific.
Bertsekas, D., and J. Tsitsiklis. 1996. Neuro-Dynamic Programming. Athena Scientific.
Sutton, R., and A. Barto. 1998. Reinforcement Learning: An Introduction. MIT Press.