CS7011: Topics in reinforcement learning
Course information
When: Jan-May 2026
Lectures: Slot J
Where: CS24
Office Hours: Wednesday, 3:30pm to 4:30pm
Pre-requisites
An introductory course in RL at the level of CS6700.
For the lecture notes of CS6700, click here
Course Content
RL foundations
Theory of average cost Markov decision processes (MDPs)
Off-policy evaluation
Least-squares methods
Temporal difference learning and its convergence
Policy-gradient algorithms: Policy gradient theorem; Gradient estimation using likelihood ratios;
Proximal policy optimization
Actor-critic methods (Compatible features, essential features, advantage updating).
Advanced topics
Risk-sensitive RL
Distributional RL
Finite time analysis of TD with linear function approximation
Finite time analysis of actor-critic methods
Reinforcement learning from human feedback (RLHF) and direct preference optimization (DPO(
Policy gradient variants: Policy Newton algorithms, Natural policy gradient, etc.
Grading
Project: 35%
Mid-sem: 25%
Homework: 10%
Papers’ discussion/presentation: 10%
Scribing: 10%
Class participation: 10%
Important Dates
Mid-term: Mar 17
Homework: Feb 28
Final exam: May 9 (as per academic calendar)
Papers’ study: Mar 28
Project proposal: Apr 4
Project presentation: Apr 30
Papers’ study
Students form a team of size at most two, and read a set of papers on RL. Each team would be then quizzed orally on the content of these papers by the instructor and TAs.
Course project
The project could be theoretical and/or practical, i.e., the course project could involve implementation of existing bandit algorithms on a sophisticated benchmark (e.g. recommendation systems) and/or original research in the form of a novel bandit formulation or extension of existing theoretical results. Given the limited timeframe of the course, working out a new result or a sufficiently novel extension of an existing result may not be completely feasible and in that case, the project report could turn into a survey of existing literature that the students familiarized with, while attempting to formulate/solve a RL problem.
For theoretically-oriented projects, the report is expected to provide a detailed description of the problem and accompanying algorithms. For the projects geared towards performance evaluation of existing RL algorithms, the report is expected to summarize the findings of the empirical investigation of state-of-the-art RL algorithms in one of the application domains.
The evaluation would be as follows: 10% for the project proposal, 20% for the final report and 10% for the presentation/viva.
Textbooks/References
RL foundations
D.P.Bertsekas, Dynamic Programming and Optimal Control, Vol. II, Athena Scientific, 2012
D.P.Bertsekas and J.N.Tsitsiklis, Neuro-Dynamic Programming, Athena Scientific, 1996
R.S.Sutton and A.G.Barto, Reinforcement Learning: An Introduction, MIT Press, 2020.
Advanced topics
Prashanth L.A. and Michael Fu, Risk-Sensitive Reinforcement Learning via Policy Gradient Search, Foundations and Trends in Machine Learning, 2022. [pdf] [Book page] [Tutorial]
Bellemare, Marc G., Will Dabney, and Mark Rowland. Distributional reinforcement learning. MIT Press, 2023. [Book page]
Prashanth L.A. and Shalabh Bhatnagar, Gradient-Based Algorithms for Zeroth-Order Optimization, Foundations and Trends in Optimization, Vol. 8: No. 1–3, pp 1-332, 2025 [Book page], [pre-publication copy]
For some of the topics, research papers and survey articles will be used.
