CS7011: Topics in reinforcement learning

Course information

  • When: Jan-May 2026

  • Lectures: Slot J

  • Where: CS24

  • Office Hours: Wednesday, 3:30pm to 4:30pm

Pre-requisites

  • An introductory course in RL at the level of CS6700.

  • For the lecture notes of CS6700, click here

Course Content

RL foundations

  • Theory of average cost Markov decision processes (MDPs)

  • Off-policy evaluation

  • Least-squares methods

  • Temporal difference learning and its convergence

  • Policy-gradient algorithms: Policy gradient theorem; Gradient estimation using likelihood ratios;

  • Proximal policy optimization

  • Actor-critic methods (Compatible features, essential features, advantage updating).

Advanced topics

  • Risk-sensitive RL

  • Distributional RL

  • Finite time analysis of TD with linear function approximation

  • Finite time analysis of actor-critic methods

  • Reinforcement learning from human feedback (RLHF) and direct preference optimization (DPO(

  • Policy gradient variants: Policy Newton algorithms, Natural policy gradient, etc.

Grading

  • Project: 35%

  • Mid-sem: 25%

  • Homework: 10%

  • Papers’ discussion/presentation: 10%

  • Scribing: 10%

  • Class participation: 10%

Important Dates

  • Mid-term: Mar 17

  • Homework: Feb 28

  • Final exam: May 9 (as per academic calendar)

  • Papers’ study: Mar 28

  • Project proposal: Apr 4

  • Project presentation: Apr 30

Papers’ study

Students form a team of size at most two, and read a set of papers on RL. Each team would be then quizzed orally on the content of these papers by the instructor and TAs.

Course project

  • The project could be theoretical and/or practical, i.e., the course project could involve implementation of existing bandit algorithms on a sophisticated benchmark (e.g. recommendation systems) and/or original research in the form of a novel bandit formulation or extension of existing theoretical results. Given the limited timeframe of the course, working out a new result or a sufficiently novel extension of an existing result may not be completely feasible and in that case, the project report could turn into a survey of existing literature that the students familiarized with, while attempting to formulate/solve a RL problem.

  • For theoretically-oriented projects, the report is expected to provide a detailed description of the problem and accompanying algorithms. For the projects geared towards performance evaluation of existing RL algorithms, the report is expected to summarize the findings of the empirical investigation of state-of-the-art RL algorithms in one of the application domains.

The evaluation would be as follows: 10% for the project proposal, 20% for the final report and 10% for the presentation/viva.

Textbooks/References

RL foundations

  • D.P.Bertsekas, Dynamic Programming and Optimal Control, Vol. II, Athena Scientific, 2012

  • D.P.Bertsekas and J.N.Tsitsiklis, Neuro-Dynamic Programming, Athena Scientific, 1996

  • R.S.Sutton and A.G.Barto, Reinforcement Learning: An Introduction, MIT Press, 2020.

Advanced topics

For some of the topics, research papers and survey articles will be used.