
Regret minimization in MAB (Click here for course notes) 

Lecture 1 
Course Organization Motivation for studying bandits 

Lecture 2 
Bandit learning framework Regret definition, equivalent expression using gaps 
1.1 
Lecture 3 
Explorethencommit (ETC) algorithm a brief tour of concentration inequalities: Subgaussianity 
1.2 
Lecture 4 
Finite sample bounds using Chernoff method end of concentration inequalities tour 
1.3 
Lecture 5 
Gap dependent regret bound for ETC 
1.4 
Lecture 6 
Gapindependent regret bound for ETC 
1.4 
Lecture 7 
UCB algorithm: introduction 
1.5 
Lecture 8 
UCB: gapdependent regret bound 
1.5 
Lecture 9 
UCB: gapindependent regret bound 
1.5 
Lecture 11 
A brief tour of information theory: Entropy, joint and conditional entropy, chain rule 
1.6 
Lecture 12 
KLdivergence: Definition, nonnegativity, chain rule 
1.6 
Lecture 13 
Pinsker's inequality, end of information theory tour 
1.6 
Lecture 14 
Regret lower bound for 2armed bandits 
1.7 
Lecture 15 
Regret lower bound for K>2 arms 
1.7 
Lecture 16 
Instance dependent (asymptotic) lower bound 
1.7 
Lecture 17 
A brief tour of Bayesian statistics: Parameter estimation (Bernoulli case) 
1.8 
Lecture 18 
Parameter estimation  case of Normal mean 
1.8 
Lecture 19 
Bayesian bandit framework and Bayesian regret Thompson sampling aka posterior sampling 
1.9 
Lecture 19 
Thompson sampling: Bayesian regret bound 
1.9 
Lecture 20 
Thompson sampling: Bayesian regret bound 
1.9 
Lecture 21 
Brief introduction to KLUCB, UCBV, UCBimproved, deviations of regret 

Lecture 22 
Brief introduction to frequentist regret bounds of Thompson sampling 


Best arm identification in MAB 

Lecture 1 
Best arm identification (BAI): Fixed budget setting Uniform sampling 
2.1 
Lecture 2 
Successive rejects (SR) algorithm 
2.2 
Lecture 3 
Analysis of error probability of SR 
2.2 
Lecture 4 
Lower bound for BAI in fixed budget setting 
2.3 
Lecture 5 
Lower bound for BAI in fixed budget setting 
2.3 
Lecture 6 
Lower bound for BAI in fixed budget setting 
2.3 
Lecture 7 
Lower bound for BAI in fixed budget setting 
2.3 
Lecture 8 
Fixed confidence BAI setting Naive uniformly exploring algorithm 
2.4 
Lecture 9 
Successive elimination with known gaps 
2.5 
Lecture 9 
Successive elimination with unknown gaps 
2.5 
Lecture 10 
Analysis of successive elimination algorithm 
2.5 
Lecture 11 
Lower bound for fixed confidence BAI 
2.6 
Lecture 12 
Median elimination and a few recent algorithms 


(Stochastic) Linear bandits 

Lecture 1 
Linear bandit formulation, Confidence ellipsoid 

Lecture 2 
LinUCB algorithm 

Lecture 3 
Sketch of regret bound 


Adversarial bandits 

Lecture 1 
Prediction with expert advice Halving algorithm, mistake bound 

Lecture 2 
Weighted majority algorithm, mistake bound 

Lecture 3 
Exponential weighted majority algorithm 

Lecture 4 
Regret bound for EWM algorithm 

Lecture 5 
Adversarial bandits: framework
EXP3 algorithm 

Lecture 6 
Regret bound for EXP3 algorithm 

Lecture 7 
EXP3.P algorithm, highprobability regret bound 

Lecture 8 
EXP3IX algorithm, highprobability regret bound 

Lecture 9 
A survey of other adversarial bandit settings 
