CS6046: Multi-armed bandits

Course information

When: Jan-May 2018
Lectures: Slot D
Where: CS36
Office Hours: Wednesday, 2:30pm to 3:30pm

Course Content

Exploration-exploitation dilemma in stochastic K-armed bandits

Explore-then-commit and epsilon-greedy strategies
Upper Confidence Bound (UCB) algorithm
Thompson sampling (TS) algorithm
Regret upper bounds for UCB and TS
Lower bounds: a) Information theoretic bounds on minimax error, b) Instance-dependent regret lower bounds

Pure exploration in K-armed bandits

Uniform sampling
Successive rejects algorithm
Lower bounds

Adversarial bandits

EXP3, EXP3-IX, EXP4 algorithms
Upper bounds on regret of EXP3 and its variants
Regret lower bounds

Advanced topics

Stochastic linear bandits: Ellipsoidal confidence sets, UCB, Thompson sampling and lower bound.
Adversarial linear bandits: Exp2 with John’s exploration, online mirror descent, lower bound.

Grading

Project: 50%
Homeworks: 40% (Best 2 out of 3)
Paper presentation: 10%

Important Dates

	Available on	Submission by
Homework 1	Jan 27	Feb 12
Homework 2	Feb 15	Mar 2
Homework 3	Mar 9	Mar 26
Project proposal		Apr 2
Project report		Apr 30

Paper presentation

Each individual student is expected to read a paper from the list here and present the main contributions of the chosen paper and (hopefully) get the class interested enough to discuss more on the presented paper. The paper presentation shall happen in the regular class hours during the month of April.

Course project

The project could be theoretical and/or practical, i.e., the course project could involve implementation of existing bandit algorithms on a sophisticated benchmark (e.g. recommendation systems) and/or original research in the form of a novel bandit formulation or extension of existing theoretical results. Given the limited timeframe of the course, working out a new result or a sufficiently novel extension of an existing result may not be completely feasible and in that case, the project report could turn into a survey of existing literature that the students familiarized with, while attempting to formulate/solve a bandit problem.
For theoretically-oriented projects, the list of papers given here can be a good place to start exploring. For the practically-oriented ones, some pointers are available here.
For theoretically-oriented projects, the report is expected to provide a detailed description of the problem and accompanying algorithms. For the projects geared towards performance evaluation of existing bandit algorithms, the report is expected to summarize the findings of the empirical investigation of state-of-the-art bandit algorithms in one of the application domains.

The evaluation would be as follows: 15% for the project proposal, 25% for the final report and 10% for an oral examination by the instructor.

Textbooks/References

Blog posts on bandit theory by Csaba Szepesvari and Tor Lattimore
Multi-armed bandits book (draft) by Alex Slivkins
Regret analysis of stochastic and non-stochastic multi-armed bandit problems by S. Bubeck and N. Cesa-Bianchi

Homeworks

Schedule

	Regret minimization in MAB (Click here for course notes)
Lecture 1	Course Organization Motivation for studying bandits
Lecture 2	Bandit learning framework Regret definition, equivalent expression using gaps	1.1
Lecture 3	Explore-then-commit (ETC) algorithm a brief tour of concentration inequalities: Subgaussianity	1.2
Lecture 4	Finite sample bounds using Chernoff method end of concentration inequalities tour	1.3
Lecture 5	Gap dependent regret bound for ETC	1.4
Lecture 6	Gap-independent regret bound for ETC	1.4
Lecture 7	UCB algorithm: introduction	1.5
Lecture 8	UCB: gap-dependent regret bound	1.5
Lecture 9	UCB: gap-independent regret bound	1.5
Lecture 11	A brief tour of information theory: Entropy, joint and conditional entropy, chain rule	1.6
Lecture 12	KL-divergence: Definition, non-negativity, chain rule	1.6
Lecture 13	Pinsker's inequality, end of information theory tour	1.6
Lecture 14	Regret lower bound for 2-armed bandits	1.7
Lecture 15	Regret lower bound for K>2 arms	1.7
Lecture 16	Instance dependent (asymptotic) lower bound	1.7
Lecture 17	A brief tour of Bayesian statistics: Parameter estimation (Bernoulli case)	1.8
Lecture 18	Parameter estimation - case of Normal mean	1.8
Lecture 19	Bayesian bandit framework and Bayesian regret Thompson sampling aka posterior sampling	1.9
Lecture 19	Thompson sampling: Bayesian regret bound	1.9
Lecture 20	Thompson sampling: Bayesian regret bound	1.9
Lecture 21	Brief introduction to KL-UCB, UCB-V, UCB-improved, deviations of regret
Lecture 22	Brief introduction to frequentist regret bounds of Thompson sampling
	Best arm identification in MAB
Lecture 1	Best arm identification (BAI): Fixed budget setting Uniform sampling	2.1
Lecture 2	Successive rejects (SR) algorithm	2.2
Lecture 3	Analysis of error probability of SR	2.2
Lecture 4	Lower bound for BAI in fixed budget setting	2.3
Lecture 5	Lower bound for BAI in fixed budget setting	2.3
Lecture 6	Lower bound for BAI in fixed budget setting	2.3
Lecture 7	Lower bound for BAI in fixed budget setting	2.3
Lecture 8	Fixed confidence BAI setting Naive uniformly exploring algorithm	2.4
Lecture 9	Successive elimination with known gaps	2.5
Lecture 9	Successive elimination with unknown gaps	2.5
Lecture 10	Analysis of successive elimination algorithm	2.5
Lecture 11	Lower bound for fixed confidence BAI	2.6
Lecture 12	Median elimination and a few recent algorithms
	(Stochastic) Linear bandits
Lecture 1	Linear bandit formulation, Confidence ellipsoid
Lecture 2	LinUCB algorithm
Lecture 3	Sketch of regret bound
	Adversarial bandits
Lecture 1	Prediction with expert advice Halving algorithm, mistake bound
Lecture 2	Weighted majority algorithm, mistake bound
Lecture 3	Exponential weighted majority algorithm
Lecture 4	Regret bound for EWM algorithm
Lecture 5	Adversarial bandits: framework EXP3 algorithm
Lecture 6	Regret bound for EXP3 algorithm
Lecture 7	EXP3.P algorithm, high-probability regret bound
Lecture 8	EXP3-IX algorithm, high-probability regret bound
Lecture 9	A survey of other adversarial bandit settings