Risk-Aware Multi-armed Bandits
Tutorial Description
The main purpose of this tutorial is to introduce and survey research results on risk-aware bandits, as well as to outline some promising avenues for future research following the risk-aware bandits framework. We consider both regret minimization and best-arm identification bandit formulations, where the traditional expected value performance measure is replaced with a risk measure. Some well-known examples of risk measures to be considered include variance (or higher moments), quantiles or value-at-risk (VaR), conditional value-at-risk (CVaR), utility-based shortfall risk (UBSR) and cumulative prospect theory (CPT).
Tutorial Outline
Tutorial overview
Review of multi-armed bandits
Risk measures
Risk estimation
Risk-aware bandits for regret minimization
Risk-aware bandits for best-arm identification
Slides
Survey article
Vincent Y. F. Tan, Prashanth L.A., and Krishna Jagannathan, A Survey of Risk-Aware Multi-Armed Bandits, International Joint Conference on Artificial Intelligence (IJCAI) (Survey Track), 2022. [arxiv]
Presenters
Krishna Jagannathan and Prashanth L.A.
Target Audience
This tutorial should be of interest to graduate students, researchers and practitioners alike, as it presents both the theory and the practical implementation of risk-aware bandit algorithms.