EE736: Introduction to Stochastic optimization

Course information

When: Jan-May 2025
Lectures: Slot 9
Where: EEG 401 (Collaborative Classroom)
Teaching Assistants: Keshav Kumar (23D1602), Sneha Oram (23M2159)

Course Content

Module 1: Introduction

Machine learning: Training problems such as linear regression, logistic regression
Zeroth-order optimization: Simulation optimization, gradient estimation using the simultaneous perturbation method

Module 2: Foundations

Convexity and smoothness: Taylor's theorem, minima of smooth functions, convex and strongly convex functions
Martingales: definition, examples, convergence

Module 3: Stochastic approximation

Applications: first and zeroth-order optimization using gradient methods, stochastic fixed point iterations
Asymptotic convergence: Ordinary differential equations (ODE) approach, Kushner-Clark lemma

Module 4: Deterministic optimization

Descent methods: Descent directions, gradient descent
Convergence rates: for non-convex, convex and strongly-convex functions

Module 5: Stochastic gradient algorithms

Non-asymptotic bounds: with unbiased gradients for different function classes
Biased gradient oracle: non-asymptotic bounds
Noise reduction: SVRG, SAG and iterate averaging
Second-order methods: Cubic-regularized Newton, perturbed GD

Module 6: Other topics (if time permits)

Policy gradient reinforcement learning

Textbook

[1] Prashanth L. A. and Shalabh Bhatnagar, Gradient-based algorithms for zeroth-order optimization, Foundations and Trends® in Optimization, 2024 (forthcoming).

Additional references:
[2] Stephen J. Wright, and Benjamin Recht. Optimization for data analysis. Cambridge University Press, 2022.
[3] V.S. Borkar. Stochastic Approximation: A Dynamical Systems Viewpoint. Texts and Readings in Mathematics, 2022.

Lecture notes

Grading

Mid-term exam: 25%
Final exam: 25%
Project: 20%
Paper review: 5%
Class participation: 5%
Mini-quizzes: 15% (Best 2 out of 3)
Book proofreading: 5%

Paper review

Students form a team of size at most two, and read a paper on stochastic optimization, either from a list (to be shared) or a paper outside the list that has the instructor's approval. Each team submits a 1-page critical review that identifies the strengths and areas of improvement of the paper studied. A paraphrase of the paper content is strongly discouraged.

Course project

The students work in teams of size at most two.
The project could be theoretical and/or practical, i.e., the course project could involve implementation of existing stochastic optimization algorithms and/or original research in the form of either a novel stochastic optimization algorithm or extension of existing theoretical results for such algorithms. Given the limited timeframe of the course, working out a new result or a sufficiently novel extension of an existing result may not be completely feasible and in that case, the project report could turn into a survey of existing literature that the students familiarized with, while attempting to formulate/solve a stochastic optimization problem.
For theoretically-oriented projects, the report is expected to provide a detailed description of the problem, accompanying algorithms and convergence analysis. For the projects geared towards performance evaluation of existing stochastic optimization algorithms, the report is expected to summarize the findings of the empirical investigation, with sufficient implementation details that would help reproduce the results.

The evaluation would be as follows: 10% for the final report and 10% for a project presentation/viva by the instructor.

Quizzes and Exams

Schedule

Lecture number	Topics Covered	Reference
Lecture 1	Introduction to zeroth-order optimization	Chapter 1 of [1], slides here
Lecture 2	Gradient estimation in a zeroth-order setting	Chapter 3 of [1]
Lecture 3	A unified gradient estimate and its bias/variance analysis	Chapter 3 of [1]
Lecture 4	First-order optimization: Linear regression, logistic regression	Chapter 1 of [2]
Lecture 5	Crash course in optimization	Appendix D of [1]
Lecture 6	Introduction to stochastic gradient (SG) algorithm
Lecture 7	Martingales: introduction, applications	Appendix B of [1]
Problem-solving session followed by mini-quiz 1
Lecture 8	Convergence of martingales	Appendix B of [1]
Lecture 9	Introduction to stochastic approximation	Chapter 2 of [1] and chapter 2 of [3]
Lecture 10	Kushner-Clark lemma, SG with unbiased gradients: asymptotic convergence	Sections 2.3 and 4.1 of [1]
Lecture 11	SG algorithm with biased gradients: asymptotic convergence	Section 4.1 of [1]
Lecture 13	Gaussian smoothing, strongly convex functions	Sections 3.3 and 5.4 of [1]
Mid-term exam
Lecture 14	Bounds for gradient descent in the noiseless regime: non-convex and convex cases	Chapter 3 of [2]
Lecture 15	Bounds for gradient descent in the noiseless regime: strongly-convex case Non-asymptotics for SG algorithm: non-convex + unbiased gradients	Chapter 3 of [2] Chapter 5 of [1]
Lecture 16	Non-asymptotics for SG algorithm: non-convex + biased gradients Applications with biased function measurements	Section 5.1 of [1]
Lecture 17	Non-asymptotics for SG algorithm: non-convex + biased gradients	Section 5.1 of [1]
Lecture 18	Non-asymptotics for SG with unbiased gradients: convex and strongly convex cases	Sections 5.2 and 5.3 of [1]
Lecture 19	Non-asymptotics for SG with biased gradients: convex and strongly convex cases	Sections 5.2 and 5.3 of [1]
Mini-quiz 2
Lecture 20	Introduction to finite horizon MDPs DP algorithm	Click here
Lecture 21	Introduction to discounted MDPs Value iteration	Click here
Lecture 22	Policy iteration Introduction to reinforcement learning	Click here
Lecture 23	TD-learning	Click here
Lecture 24	Q-learning REINFORCE policy gradient algorithm	Click here
End-term exam