CS6515: Stochastic optimization

Course information

  • When: Jan-May 2023

  • Lectures: Slot J

  • Where: ESB207B

  • Teaching Assistants: None

Course Content

Module 1: Introduction

  • Examples: Least squares, matrix factorization, SVM, logistic regression, MCMC, nonlinear urns, RL

  • Preliminaries: Selected topics in probability and linear algebra

Module 2: Foundations - Convexity and smoothness

  • Taylor's theorem, minima of smooth functions

  • Convex sets and functions

Module 3: Foundations - Stochastic approximation

  • ODE approach and asymptotic convergence

Module 4: Descent methods

  • Descent directions

  • Convergence and rates

Module 5: Stochastic gradient algorithms

  • Convergence analysis for convex case

  • Sum of smooth and strongly convex functions

  • Convegence for general case

  • Noise reduction, e.g. SVRG, SAG and iterate averaging

  • Second-order methods

Module 6: Zeroth order optimization

  • Gradient estimation

  • Convergence analysis

Module 7: Other topics (if time permits)

  • Fixed point iterations, RL, MCMC, co-ordinate descent

Textbooks/References

[1] Stephen J. Wright, and Benjamin Recht. Optimization for data analysis. Cambridge University Press, 2022.
[2] V.S. Borkar. Stochastic Approximation: A Dynamical Systems Viewpoint. Texts and Readings in Mathematics, 2022.
[3] Sebastien Bubeck. Convex optimization: Algorithms and complexity. Foundations and Trends in Machine Learning, 2015.
[4] Stephen P. Boyd, and Lieven Vandenberghe. Convex optimization. Cambridge university press, 2004.

Grading

  • Mid-term exam: 25%

  • Final exam and viva: 20%

  • Project: 25%

  • Paper review: 5%

  • Scribing: 10%

  • Mini-quizzes: 15%

Important Dates

  • Mini-quizzes: Feb 27, Apr 3, Apr 17

  • Mid-term: Mar 9

  • Final: May 9

  • Project: Apr 27

  • Paper review: Apr 10

Paper review

Students form a team of size at most two, and read a paper on stochastic optimization, either from a list (to be shared) or a paper outside the list that has the instructor's approval. Each team submits a 1-page critical review that identifies the strengths and areas of improvement of the paper studied. A paraphrase of the paper content is strongly discouraged.

Course project

  • The students work in teams of size at most two.

  • The project could be theoretical and/or practical, i.e., the course project could involve implementation of existing stochastic optimization algorithms and/or original research in the form of either a novel stochastic optimization algorithm or extension of existing theoretical results for such algorithms. Given the limited timeframe of the course, working out a new result or a sufficiently novel extension of an existing result may not be completely feasible and in that case, the project report could turn into a survey of existing literature that the students familiarized with, while attempting to formulate/solve a stochastic optimization problem.

  • For theoretically-oriented projects, the report is expected to provide a detailed description of the problem, accompanying algorithms and convergence analysis. For the projects geared towards performance evaluation of existing stochastic optimization algorithms, the report is expected to summarize the findings of the empirical investigation, with sufficient implementation details that would help reproduce the results.

The evaluation would be as follows: 5% for the project proposal, 10% for the final report and 10% for a project presentation/viva by the instructor.

Schedule

Lecture number Topics Covered Reference
Lecture 1 Motivating examples: Linear models in ML Chapter 1 of [1] and my notes here
Lecture 2 Motivating examples: Max-likelihood estimation My notes here
Lecture 3 Motivating examples: Logistic regression My notes here
Lecture 4 Motivating examples: Zeroth-order optimization Slides here
Lecture 5 Gradient estimation using simultaneous perturbation method Slides here
Lecture 6 A unified gradient estimate and its bias/variance analysis Slides here
Lecture 7 Foundations of smooth optimization: local/global optima
Necessary conditions for local minima
Chapter 2 of [1]
Lecture 8 Taylor's theorem, smooth functions Chapter 2 of [1]
Lecture 9 Sufficient conditions for local minima Chapter 2 of [1]
Lecture 10 Convex sets and functions Chapter 2 of [1]
and Chapters 2,3 of [4]
Lecture 11 Characterization of convex functions Chapter 2 of [1]
and Chapters 2,3 of [4]
Lecture 12 Strongly convex functions Chapter 2 of [1]
Mini-quiz 1
Lecture 13 Convergence of random variables Lec 28 of EE5110 course on probability by Prof. Krishna Jagannathan, click here
Lecture 14 Convergence of random variables
Conditional expectation
Sec 3.7 of Grimmett and Stirzaker's book:
Probability and random processes
Lecture 15 Conditional expectation Sec 4.6 of Grimmett and Stirzaker's book:
Probability and random processes
Lecture 16 Martingales: Introduction Sec 7.7 of Grimmett and Stirzaker's book:
Probability and random processes
Lecture 17 Martingales: Applications from stochastic approximation
Lecture 18 Martingales: maximal inequality Sec 7.8 of Grimmett and Stirzaker's book:
Probability and random processes
Lecture 19 Problem solving session on
smoothness, convexity, and topics in probability
Mid-term exam
Lecture 20 Discussion of mini-quiz 1 and mid-term problems
Lecture 21 Martingales: convergence theorem Sec 7.8 of Grimmett and Stirzaker's book:
Probability and random processes
Lecture 22 Stochastic approximation: introduction Chapter 1 of [2]
Lecture 23 Stochastic approximation:
Asymptotic convergence
Chapter 2 of [2]
Mini-quiz 2
Lecture 24 Stochastic approximation: Applications
Lecture 25 Discussion of mini-quiz 2
Non-asymptotics for smooth optimization
Lecture 26 Gradient descent:
Analysis for a smooth objective
Chapter 3 of [1]
Lecture 27 Gradient descent:
Analysis for a convex/strongly convex objective
Chapter 3 of [1]
Lecture 28 Stochastic gradient algorithm:
Non-asymptotic analysis
with unbiased gradient estimates
Ghadimi S, Lan G (2013) Stochastic first- and zeroth-order methods for nonconvex stochastic programming. SIAM J. Optim. 23:2341–2368
Mini-quiz 3
Lecture 29 Stochastic gradient algorithm:
Non-asymptotic analysis
with biased gradient estimates
Bhavsar N, Prashanth LA (2022) Non-asymptotic bounds for stochastic optimization with biased noisy gradient oracles. IEEE Transactions on Automatic Control, 68:1628-1641
Lecture 30 Stochastic gradient algorithm:
Non-asymptotic analysis in the strongly convex case
Bottou L, Curtis FE, Nocedal J (2018) Optimization methods for large-scale machine learning. Siam Review 60(2):223–311
Lecture 31 Variance reduction: SAG, SVRG Gower, R. M., Schmidt, M., Bach, F., & Richtarik, P. (2020). Variance-reduced methods for machine learning. Proceedings of the IEEE, 108(11), 1968-1983