Back to Courses

Reinforcement Learning

Reinforcement learning is a paradigm that aims to model the trial-and-error learning process that is needed in many problem situations where explicit instructive signals are not available. It has roots in operations research, behavioral psychology and AI. The goal of the course is to introduce the basic mathematical foundations of reinforcement learning, as well as highlight some of the recent directions of research.

The tables below enlists the courses materials for Week 0 to Week 12. Each topic has both YouTube link and VideoKen link.

Week 0 - Preparatory Material

  • Probability tutorial - 1
  • Probability tutorial - 2
  • Linear algebra tutorial - 1
  • Linear algebra tutorial - 2
  • Assignment 0
  • Solution 0
  • Week 1 - Introduction to RL and Immediate RL

  • Introduction to RL
  • youtube video videoken video
  • RL framework and applications
  • youtube video videoken video
  • Introduction to immediate RL
  • youtube video videoken video
  • Bandit optimalities
  • youtube video videoken video
  • Value function based methods
  • youtube video videoken video
  • Assignment 1
  • Solution 1
  • Week 2 - Bandit Algorithms

  • UCB 1
  • youtube video videoken video
  • Concentration bounds
  • youtube video videoken video
  • UCB 1 Theorem
  • youtube video videoken video
  • PAC bounds
  • youtube video videoken video
  • Median elimination
  • youtube video videoken video
  • Thompson sampling
  • youtube video videoken video
  • Assignment 2
  • Solution 2
  • Additional Reads
  • Auer, P.; Cesa-Bianchi, N.; Fischer, P. 2002. Finite-time Analysis of the Multiarmed Bandit Problem.
  • Auer, P.; Ortner, R. 2010. UCB Revisited: Improved Regret Bounds for the Stochastic Multi-Armed Bandit Problem.
  • Even-Dar, E.; Mannor, S.; Mansour, Y. 2006. Action Elimination and Stopping Conditions for the Multi-Armed Bandit and Reinforcement Learning Problems.
  • Tutorial on OFUL (Szepesvari, C.) Part 1 | Part 2 | Part 3
  • Week 3 - Policy Gradient Methods & Introduction to Full RL

  • Policy search
  • youtube video videoken video
  • REINFORCE
  • youtube video videoken video
  • Contextual bandits
  • youtube video videoken video
  • Full RL introduction
  • youtube video videoken video
  • Returns, value functions & MDPs
  • youtube video videoken video
  • Assignment 3
  • Solution 3
  • Additional Reads
  • Notes on REINFORCE algorithm
  • Week 4 - MDP Formulation, Bellman Equations & Optimality Proofs

  • MDP modelling
  • youtube video videoken video
  • Bellman equation
  • youtube video videoken video
  • Bellman optimality equation
  • youtube video videoken video
  • Cauchy sequence & Green's equation
  • youtube video videoken video
  • Banach fixed point theorem
  • youtube video videoken video
  • Convergence proof
  • youtube video videoken video
  • Assignment 4
  • Solution 4
  • Week 5 - Dynamic Programming & Monte Carlo Methods

  • LPI convergence      
  • youtube video videoken video
  • Value iteration
  • youtube video videoken video
  • Policy iteration
  • youtube video videoken video
  • Dynamic programming
  • youtube video videoken video
  • Monte Carlo
  • youtube video videoken video
  • Control in Monte Carlo
  • youtube video videoken video
  • Assignment 5
  • Solution 5
  • Week 6 - Monte Carlo & Temporal Difference Methods

  • Off Policy MC                 
  • youtube video videoken video
  • UCT
  • youtube video videoken video
  • TD(0)
  • youtube video videoken video
  • TD(0) control
  • youtube video videoken video
  • Q-learning
  • youtube video videoken video
  • Afterstate
  • youtube video videoken video
  • Assignment 6
  • Solution 6
  • Week 7 - Eligibility Traces

  • Eligibility traces
  • youtube video videoken video
  • Backward view of eligibility traces
  • youtube video videoken video
  • Eligibility trace control
  • youtube video videoken video
  • Thompson sampling recap
  • youtube video videoken video
  • Assignment 7
  • Solution 7
  • Week 8 - Function Approximation

  • Function approximation
  • youtube video videoken video
  • Linear parameterization
  • youtube video videoken video
  • State aggregation methods
  • youtube video videoken video
  • Function approximation & eligibility traces
  • youtube video videoken video
  • LSTD & LSTDQ
  • youtube video videoken video
  • LSPI & Fitted Q
  • youtube video videoken video
  • Assignment 8
  • Solution 8
  • Week 9 - DQN, Fitted Q & Policy Gradient Approaches

  • DQN & Fitted Q-iteration
  • youtube video videoken video
  • Policy gradient approach
  • youtube video videoken video
  • Actor critic & REINFORCE
  • youtube video videoken video
  • REINFORCE (cont'd)
  • youtube video videoken video
  • Policy gradient with function approximation
  • youtube video videoken video
  • Assignment 9
  • Solution 9
  • Additional Reads
  • Notes on Policy Gradient Algorithms
  • Week 10 - Hierarchical Reinforcement Learning

  • Hierarchical reinforcement learning
  • youtube video videoken video
  • Types of optimality
  • youtube video videoken video
  • Semi-Markov decision processes
  • youtube video videoken video
  • Options
  • youtube video videoken video
  • Learning with options
  • youtube video videoken video
  • Hierarchical abstract machines
  • youtube video videoken video
  • Assignment 10
  • Solution 10
  • Additional Reads
  • Andrew G. Barto and Sridhar Mahadevan. 2003. Recent Advances in Hierarchical Reinforcement Learning. Discrete Event Dynamic Systems 13, 1-2 (January 2003), 41-77. DOI: https://doi.org/10.1023/A:1022140919877
  • Week 11 - Hierarchical RL: MAXQ

  • MAXQ
  • youtube video videoken video
  • MAXQ value function decomposition
  • youtube video videoken video
  • Option discovery
  • youtube video videoken video
  • Assignment 11
  • Solution 11
  • Additional Reads
  • Dietterich, T. G. 2000. Hierarchical Reinforcement Learning with the MAXQ Value Function Decomposition.
  • Week 12 - POMDPs

  • POMDP introduction         
  • youtube video videoken video
  • Solving POMDP
  • youtube video videoken video
  • Assignment 12
  • Solution 12
  • Additional Reads
  • POMDP Tutorial
  • Tutorial on Predictive State Representations (Singh, S. P.)