EE736: Introduction to Stochastic optimization

Course information

  • When: Jan-May 2025

  • Lectures: Slot 9

  • Where: EEG 401 (Collaborative Classroom)

  • Teaching Assistants: Keshav Kumar (23D1602), Sneha Oram (23M2159)

Course Content

Module 1: Introduction

Machine learning

Training problems such as linear regression, logistic regression

Zeroth-order optimization

Simulation optimization, gradient estimation using the simultaneous perturbation method

Module 2: Foundations

Convexity and smoothness

Taylor's theorem, minima of smooth functions, convex and strongly convex functions

Martingales

definition, examples, convergence

Module 3: Stochastic approximation

Applications

first and zeroth-order optimization using gradient methods, stochastic fixed point iterations

Asymptotic convergence

Ordinary differential equations (ODE) approach, Kushner-Clark lemma

Module 4: Deterministic optimization

Descent methods

Descent directions, gradient descent

Convergence rates

for non-convex, convex and strongly-convex functions

Module 5: Stochastic gradient algorithms

Non-asymptotic bounds

with unbiased gradients for different function classes

Biased gradient oracle

non-asymptotic bounds

Noise reduction

SVRG, SAG and iterate averaging

Second-order methods

Cubic-regularized Newton, perturbed GD

Module 6: Other topics (if time permits)

  • Policy gradient reinforcement learning

Textbook

[1] Prashanth L. A. and Shalabh Bhatnagar, Gradient-based algorithms for zeroth-order optimization, Foundations and Trends® in Optimization, 2024 (forthcoming).

Additional references:
[2] Stephen J. Wright, and Benjamin Recht. Optimization for data analysis. Cambridge University Press, 2022.
[3] V.S. Borkar. Stochastic Approximation: A Dynamical Systems Viewpoint. Texts and Readings in Mathematics, 2022.

Lecture notes

Grading

  • Mid-term exam: 25%

  • Final exam: 25%

  • Project: 20%

  • Paper review: 5%

  • Class participation: 5%

  • Mini-quizzes: 15% (Best 2 out of 3)

  • Book proofreading: 5%

Paper review

Students form a team of size at most two, and read a paper on stochastic optimization, either from a list (to be shared) or a paper outside the list that has the instructor's approval. Each team submits a 1-page critical review that identifies the strengths and areas of improvement of the paper studied. A paraphrase of the paper content is strongly discouraged.

Course project

  • The students work in teams of size at most two.

  • The project could be theoretical and/or practical, i.e., the course project could involve implementation of existing stochastic optimization algorithms and/or original research in the form of either a novel stochastic optimization algorithm or extension of existing theoretical results for such algorithms. Given the limited timeframe of the course, working out a new result or a sufficiently novel extension of an existing result may not be completely feasible and in that case, the project report could turn into a survey of existing literature that the students familiarized with, while attempting to formulate/solve a stochastic optimization problem.

  • For theoretically-oriented projects, the report is expected to provide a detailed description of the problem, accompanying algorithms and convergence analysis. For the projects geared towards performance evaluation of existing stochastic optimization algorithms, the report is expected to summarize the findings of the empirical investigation, with sufficient implementation details that would help reproduce the results.

The evaluation would be as follows: 10% for the final report and 10% for a project presentation/viva by the instructor.

Quizzes and Exams

Schedule

Lecture number Topics Covered Reference
Lecture 1 Introduction to zeroth-order optimization Chapter 1 of [1], slides here
Lecture 2 Gradient estimation in a zeroth-order setting Chapter 3 of [1]
Lecture 3 A unified gradient estimate and its bias/variance analysis Chapter 3 of [1]
Lecture 4 First-order optimization: Linear regression, logistic regression Chapter 1 of [2]
Lecture 5 Crash course in optimization Appendix D of [1]
Lecture 6 Introduction to stochastic gradient (SG) algorithm
Lecture 7 Martingales: introduction, applications Appendix B of [1]
Problem-solving session followed by mini-quiz 1
Lecture 8 Convergence of martingales Appendix B of [1]
Lecture 9 Introduction to stochastic approximation Chapter 2 of [1] and chapter 2 of [3]
Lecture 10 Kushner-Clark lemma, SG with unbiased gradients: asymptotic convergence Sections 2.3 and 4.1 of [1]
Lecture 11 SG algorithm with biased gradients: asymptotic convergence Section 4.1 of [1]
Lecture 13 Gaussian smoothing, strongly convex functions Sections 3.3 and 5.4 of [1]
Mid-term exam
Lecture 14 Bounds for gradient descent in the noiseless regime:
non-convex and convex cases
Chapter 3 of [2]
Lecture 15 Bounds for gradient descent in the noiseless regime:
strongly-convex case
Non-asymptotics for SG algorithm:
non-convex + unbiased gradients
Chapter 3 of [2]

Chapter 5 of [1]
Lecture 16 Non-asymptotics for SG algorithm:
non-convex + biased gradients
Applications with biased function measurements
Section 5.1 of [1]
Lecture 17 Non-asymptotics for SG algorithm:
non-convex + biased gradients
Section 5.1 of [1]
Lecture 18 Non-asymptotics for SG with unbiased gradients:
convex and strongly convex cases
Sections 5.2 and 5.3 of [1]
Lecture 19 Non-asymptotics for SG with biased gradients:
convex and strongly convex cases
Sections 5.2 and 5.3 of [1]
Mini-quiz 2
Lecture 20 Introduction to finite horizon MDPs
DP algorithm
Click here
Lecture 21 Introduction to discounted MDPs
Value iteration
Click here
Lecture 22 Policy iteration
Introduction to reinforcement learning
Click here
Lecture 23 TD-learning Click here
Lecture 24 Q-learning
REINFORCE policy gradient algorithm
Click here
End-term exam