CS6515: Stochastic optimization
Course information
When: Jan-May 2023
Lectures: Slot J
Where: ESB207B
Teaching Assistants: None
Course Content
Module 1: Introduction
Examples: Least squares, matrix factorization, SVM, logistic regression, MCMC, nonlinear urns, RL
Preliminaries: Selected topics in probability and linear algebra
Module 2: Foundations - Convexity and smoothness
Taylor's theorem, minima of smooth functions
Convex sets and functions
Module 3: Foundations - Stochastic approximation
ODE approach and asymptotic convergence
Module 4: Descent methods
Descent directions
Convergence and rates
Module 5: Stochastic gradient algorithms
Convergence analysis for convex case
Sum of smooth and strongly convex functions
Convegence for general case
Noise reduction, e.g. SVRG, SAG and iterate averaging
Second-order methods
Module 6: Zeroth order optimization
Gradient estimation
Convergence analysis
Module 7: Other topics (if time permits)
Fixed point iterations, RL, MCMC, co-ordinate descent
Textbooks/References
[1] Stephen J. Wright, and Benjamin Recht. Optimization for data analysis. Cambridge University Press, 2022.
[2] V.S. Borkar. Stochastic Approximation: A Dynamical Systems Viewpoint. Texts and Readings in Mathematics, 2022.
[3] Sebastien Bubeck. Convex optimization: Algorithms and complexity. Foundations and Trends in Machine Learning, 2015.
[4] Stephen P. Boyd, and Lieven Vandenberghe. Convex optimization. Cambridge university press, 2004.
Grading
Mid-term exam: 25%
Final exam and viva: 20%
Project: 25%
Paper review: 5%
Scribing: 10%
Mini-quizzes: 15%
Important Dates
Mini-quizzes: Feb 27, Apr 3, Apr 17
Mid-term: Mar 9
Final: May 9
Project: Apr 27
Paper review: Apr 10
Paper review
Students form a team of size at most two, and read a paper on stochastic optimization, either from a list (to be shared) or a paper outside the list that has the instructor's approval. Each team submits a 1-page critical review that identifies the strengths and areas of improvement of the paper studied. A paraphrase of the paper content is strongly discouraged.
Course project
The students work in teams of size at most two.
The project could be theoretical and/or practical, i.e., the course project could involve implementation of existing stochastic optimization algorithms and/or original research in the form of either a novel stochastic optimization algorithm or extension of existing theoretical results for such algorithms. Given the limited timeframe of the course, working out a new result or a sufficiently novel extension of an existing result may not be completely feasible and in that case, the project report could turn into a survey of existing literature that the students familiarized with, while attempting to formulate/solve a stochastic optimization problem.
For theoretically-oriented projects, the report is expected to provide a detailed description of the problem, accompanying algorithms and convergence analysis. For the projects geared towards performance evaluation of existing stochastic optimization algorithms, the report is expected to summarize the findings of the empirical investigation, with sufficient implementation details that would help reproduce the results.
The evaluation would be as follows: 5% for the project proposal, 10% for the final report and 10% for a project presentation/viva by the instructor.
Schedule
Lecture number | Topics Covered | Reference |
---|---|---|
Lecture 1 | Motivating examples: Linear models in ML | Chapter 1 of [1] and my notes here |
Lecture 2 | Motivating examples: Max-likelihood estimation | My notes here |
Lecture 3 | Motivating examples: Logistic regression | My notes here |
Lecture 4 | Motivating examples: Zeroth-order optimization | Slides here |
Lecture 5 | Gradient estimation using simultaneous perturbation method | Slides here |
Lecture 6 | A unified gradient estimate and its bias/variance analysis | Slides here |
Lecture 7 | Foundations of smooth optimization: local/global optimaNecessary conditions for local minima | Chapter 2 of [1] |
Lecture 8 | Taylor's theorem, smooth functions | Chapter 2 of [1] |
Lecture 9 | Sufficient conditions for local minima | Chapter 2 of [1] |
Lecture 10 | Convex sets and functions | Chapter 2 of [1] and Chapters 2,3 of [4] |
Lecture 11 | Characterization of convex functions | Chapter 2 of [1] and Chapters 2,3 of [4] |
Lecture 12 | Strongly convex functions | Chapter 2 of [1] |
Mini-quiz 1 | ||
Lecture 13 | Convergence of random variables | Lec 28 of EE5110 course on probability by Prof. Krishna Jagannathan, click here |
Lecture 14 | Convergence of random variables Conditional expectation | Sec 3.7 of Grimmett and Stirzaker's book: Probability and random processes |
Lecture 15 | Conditional expectation | Sec 4.6 of Grimmett and Stirzaker's book: Probability and random processes |
Lecture 16 | Martingales: Introduction | Sec 7.7 of Grimmett and Stirzaker's book: Probability and random processes |
Lecture 17 | Martingales: Applications from stochastic approximation | |
Lecture 18 | Martingales: maximal inequality | Sec 7.8 of Grimmett and Stirzaker's book: Probability and random processes |
Lecture 19 | Problem solving session on smoothness, convexity, and topics in probability | |
Mid-term exam | ||
Lecture 20 | Discussion of mini-quiz 1 and mid-term problems | |
Lecture 21 | Martingales: convergence theorem | Sec 7.8 of Grimmett and Stirzaker's book: Probability and random processes |
Lecture 22 | Stochastic approximation: introduction | Chapter 1 of [2] |
Lecture 23 | Stochastic approximation: Asymptotic convergence | Chapter 2 of [2] |
Mini-quiz 2 | ||
Lecture 24 | Stochastic approximation: Applications | |
Lecture 25 | Discussion of mini-quiz 2 Non-asymptotics for smooth optimization | |
Lecture 26 | Gradient descent: Analysis for a smooth objective | Chapter 3 of [1] |
Lecture 27 | Gradient descent: Analysis for a convex/strongly convex objective | Chapter 3 of [1] |
Lecture 28 | Stochastic gradient algorithm: Non-asymptotic analysis with unbiased gradient estimates | Ghadimi S, Lan G (2013) Stochastic first- and zeroth-order methods for nonconvex stochastic programming. SIAM J. Optim. 23:2341–2368 |
Mini-quiz 3 | ||
Lecture 29 | Stochastic gradient algorithm: Non-asymptotic analysis with biased gradient estimates | Bhavsar N, Prashanth LA (2022) Non-asymptotic bounds for stochastic optimization with biased noisy gradient oracles. IEEE Transactions on Automatic Control, 68:1628-1641 |
Lecture 30 | Stochastic gradient algorithm: Non-asymptotic analysis in the strongly convex case | Bottou L, Curtis FE, Nocedal J (2018) Optimization methods for large-scale machine learning. Siam Review 60(2):223–311 |
Lecture 31 | Variance reduction: SAG, SVRG | Gower, R. M., Schmidt, M., Bach, F., & Richtarik, P. (2020). Variance-reduced methods for machine learning. Proceedings of the IEEE, 108(11), 1968-1983 |