EE736: Introduction to Stochastic optimization
Course information
When: Jan-May 2025
Lectures: Slot 9
Where: EEG 401 (Collaborative Classroom)
Teaching Assistants: Keshav Kumar (23D1602), Sneha Oram (23M2159)
Course Content
Module 1: Introduction
- Machine learning
Training problems such as linear regression, logistic regression
- Zeroth-order optimization
Simulation optimization, gradient estimation using the simultaneous perturbation method
Module 2: Foundations
- Convexity and smoothness
Taylor's theorem, minima of smooth functions, convex and strongly convex functions
- Martingales
definition, examples, convergence
Module 3: Stochastic approximation
- Applications
first and zeroth-order optimization using gradient methods, stochastic fixed point iterations
- Asymptotic convergence
Ordinary differential equations (ODE) approach, Kushner-Clark lemma
Module 4: Deterministic optimization
- Descent methods
Descent directions, gradient descent
- Convergence rates
for non-convex, convex and strongly-convex functions
Module 5: Stochastic gradient algorithms
- Non-asymptotic bounds
with unbiased gradients for different function classes
- Biased gradient oracle
non-asymptotic bounds
- Noise reduction
SVRG, SAG and iterate averaging
- Second-order methods
Cubic-regularized Newton, perturbed GD
Module 6: Other topics (if time permits)
Policy gradient reinforcement learning
Textbook
[1] Prashanth L. A. and Shalabh Bhatnagar, Gradient-based algorithms for zeroth-order optimization, Foundations and Trends® in Optimization, 2024 (forthcoming).
Additional references:
[2] Stephen J. Wright, and Benjamin Recht. Optimization for data analysis. Cambridge University Press, 2022.
[3] V.S. Borkar. Stochastic Approximation: A Dynamical Systems Viewpoint. Texts and Readings in Mathematics, 2022.
Lecture notes
Grading
Mid-term exam: 25%
Final exam: 25%
Project: 20%
Paper review: 5%
Class participation: 5%
Mini-quizzes: 15% (Best 2 out of 3)
Book proofreading: 5%
Paper review
Students form a team of size at most two, and read a paper on stochastic optimization, either from a list (to be shared) or a paper outside the list that has the instructor's approval. Each team submits a 1-page critical review that identifies the strengths and areas of improvement of the paper studied. A paraphrase of the paper content is strongly discouraged.
Course project
The students work in teams of size at most two.
The project could be theoretical and/or practical, i.e., the course project could involve implementation of existing stochastic optimization algorithms and/or original research in the form of either a novel stochastic optimization algorithm or extension of existing theoretical results for such algorithms. Given the limited timeframe of the course, working out a new result or a sufficiently novel extension of an existing result may not be completely feasible and in that case, the project report could turn into a survey of existing literature that the students familiarized with, while attempting to formulate/solve a stochastic optimization problem.
For theoretically-oriented projects, the report is expected to provide a detailed description of the problem, accompanying algorithms and convergence analysis. For the projects geared towards performance evaluation of existing stochastic optimization algorithms, the report is expected to summarize the findings of the empirical investigation, with sufficient implementation details that would help reproduce the results.
The evaluation would be as follows: 10% for the final report and 10% for a project presentation/viva by the instructor.
Quizzes and Exams
Schedule
Lecture number | Topics Covered | Reference |
---|---|---|
Lecture 1 | Introduction to zeroth-order optimization | Chapter 1 of [1], slides here |
Lecture 2 | Gradient estimation in a zeroth-order setting | Chapter 3 of [1] |
Lecture 3 | A unified gradient estimate and its bias/variance analysis | Chapter 3 of [1] |
Lecture 4 | First-order optimization: Linear regression, logistic regression | Chapter 1 of [2] |
Lecture 5 | Crash course in optimization | Appendix D of [1] |
Lecture 6 | Introduction to stochastic gradient (SG) algorithm | |
Lecture 7 | Martingales: introduction, applications | Appendix B of [1] |
Problem-solving session followed by mini-quiz 1 | ||
Lecture 8 | Convergence of martingales | Appendix B of [1] |
Lecture 9 | Introduction to stochastic approximation | Chapter 2 of [1] and chapter 2 of [3] |
Lecture 10 | Kushner-Clark lemma, SG with unbiased gradients: asymptotic convergence | Sections 2.3 and 4.1 of [1] |
Lecture 11 | SG algorithm with biased gradients: asymptotic convergence | Section 4.1 of [1] |
Lecture 13 | Gaussian smoothing, strongly convex functions | Sections 3.3 and 5.4 of [1] |
Mid-term exam | ||
Lecture 14 | Bounds for gradient descent in the noiseless regime: non-convex and convex cases | Chapter 3 of [2] |
Lecture 15 | Bounds for gradient descent in the noiseless regime: strongly-convex case Non-asymptotics for SG algorithm: non-convex + unbiased gradients | Chapter 3 of [2] Chapter 5 of [1] |
Lecture 16 | Non-asymptotics for SG algorithm: non-convex + biased gradients Applications with biased function measurements | Section 5.1 of [1] |
Lecture 17 | Non-asymptotics for SG algorithm: non-convex + biased gradients | Section 5.1 of [1] |
Lecture 18 | Non-asymptotics for SG with unbiased gradients: convex and strongly convex cases | Sections 5.2 and 5.3 of [1] |
Lecture 19 | Non-asymptotics for SG with biased gradients: convex and strongly convex cases | Sections 5.2 and 5.3 of [1] |
Mini-quiz 2 | ||
Lecture 20 | Introduction to finite horizon MDPs DP algorithm | Click here |
Lecture 21 | Introduction to discounted MDPs Value iteration | Click here |
Lecture 22 | Policy iterationIntroduction to reinforcement learning | Click here |
Lecture 23 | TD-learning | Click here |
Lecture 24 | Q-learning REINFORCE policy gradient algorithm | Click here |
End-term exam |