CS6515: Stochastic optimization
Course information
When: JanMay 2023
Lectures: Slot J
Where: ESB207B
Teaching Assistants: None
Course Content
Module 1: Introduction
Examples: Least squares, matrix factorization, SVM, logistic regression, MCMC, nonlinear urns, RL
Preliminaries: Selected topics in probability and linear algebra
Module 2: Foundations  Convexity and smoothness
Taylor's theorem, minima of smooth functions
Convex sets and functions
Module 3: Foundations  Stochastic approximation
ODE approach and asymptotic convergence
Module 4: Descent methods
Descent directions
Convergence and rates
Module 5: Stochastic gradient algorithms
Convergence analysis for convex case
Sum of smooth and strongly convex functions
Convegence for general case
Noise reduction, e.g. SVRG, SAG and iterate averaging
Secondorder methods
Module 6: Zeroth order optimization
Gradient estimation
Convergence analysis
Module 7: Other topics (if time permits)
Fixed point iterations, RL, MCMC, coordinate descent
Textbooks/References
[1] Stephen J. Wright, and Benjamin Recht. Optimization for data analysis. Cambridge University Press, 2022.
[2] V.S. Borkar. Stochastic Approximation: A Dynamical Systems Viewpoint. Texts and Readings in Mathematics, 2022.
[3] Sebastien Bubeck. Convex optimization: Algorithms and complexity. Foundations and Trends in Machine Learning, 2015.
[4] Stephen P. Boyd, and Lieven Vandenberghe. Convex optimization. Cambridge university press, 2004.
Grading
Midterm exam: 25%
Final exam and viva: 20%
Project: 25%
Paper review: 5%
Scribing: 10%
Miniquizzes: 15%
Important Dates
Miniquizzes: Feb 27, Apr 3, Apr 17
Midterm: Mar 9
Final: May 9
Project: Apr 27
Paper review: Apr 10
Paper review
Students form a team of size at most two, and read a paper on stochastic optimization, either from a list (to be shared) or a paper outside the list that has the instructor's approval. Each team submits a 1page critical review that identifies the strengths and areas of improvement of the paper studied. A paraphrase of the paper content is strongly discouraged.
Course project
The students work in teams of size at most two.
The project could be theoretical and/or practical, i.e., the course project could involve implementation of existing stochastic optimization algorithms and/or original research in the form of either a novel stochastic optimization algorithm or extension of existing theoretical results for such algorithms. Given the limited timeframe of the course, working out a new result or a sufficiently novel extension of an existing result may not be completely feasible and in that case, the project report could turn into a survey of existing literature that the students familiarized with, while attempting to formulate/solve a stochastic optimization problem.
For theoreticallyoriented projects, the report is expected to provide a detailed description of the problem, accompanying algorithms and convergence analysis. For the projects geared towards performance evaluation of existing stochastic optimization algorithms, the report is expected to summarize the findings of the empirical investigation, with sufficient implementation details that would help reproduce the results.
The evaluation would be as follows: 5% for the project proposal, 10% for the final report and 10% for a project presentation/viva by the instructor.
Schedule
Lecture number  Topics Covered  Reference 

Lecture 1  Motivating examples: Linear models in ML  Chapter 1 of [1] and my notes here 
Lecture 2  Motivating examples: Maxlikelihood estimation  My notes here 
Lecture 3  Motivating examples: Logistic regression  My notes here 
Lecture 4  Motivating examples: Zerothorder optimization  Slides here 
Lecture 5  Gradient estimation using simultaneous perturbation method  Slides here 
Lecture 6  A unified gradient estimate and its bias/variance analysis  Slides here 
Lecture 7  Foundations of smooth optimization: local/global optima Necessary conditions for local minima 
Chapter 2 of [1] 
Lecture 8  Taylor's theorem, smooth functions  Chapter 2 of [1] 
Lecture 9  Sufficient conditions for local minima  Chapter 2 of [1] 
Lecture 10  Convex sets and functions  Chapter 2 of [1] and Chapters 2,3 of [4] 
Lecture 11  Characterization of convex functions  Chapter 2 of [1] and Chapters 2,3 of [4] 
Lecture 12  Strongly convex functions  Chapter 2 of [1] 
Miniquiz 1  
Lecture 13  Convergence of random variables  Lec 28 of EE5110 course on probability by Prof. Krishna Jagannathan, click here 
Lecture 14  Convergence of random variables Conditional expectation 
Sec 3.7 of Grimmett and Stirzaker's book: Probability and random processes 
Lecture 15  Conditional expectation  Sec 4.6 of Grimmett and Stirzaker's book: Probability and random processes 
Lecture 16  Martingales: Introduction  Sec 7.7 of Grimmett and Stirzaker's book: Probability and random processes 
Lecture 17  Martingales: Applications from stochastic approximation  
Lecture 18  Martingales: maximal inequality  Sec 7.8 of Grimmett and Stirzaker's book: Probability and random processes 
Lecture 19  Problem solving session on smoothness, convexity, and topics in probability 

Midterm exam  
Lecture 20  Discussion of miniquiz 1 and midterm problems  
Lecture 21  Martingales: convergence theorem  Sec 7.8 of Grimmett and Stirzaker's book: Probability and random processes 
Lecture 22  Stochastic approximation: introduction  Chapter 1 of [2] 
Lecture 23  Stochastic approximation: Asymptotic convergence 
Chapter 2 of [2] 
Miniquiz 2  
Lecture 24  Stochastic approximation: Applications  
Lecture 25  Discussion of miniquiz 2 Nonasymptotics for smooth optimization 

Lecture 26  Gradient descent: Analysis for a smooth objective 
Chapter 3 of [1] 
Lecture 27  Gradient descent: Analysis for a convex/strongly convex objective 
Chapter 3 of [1] 
Lecture 28  Stochastic gradient algorithm: Nonasymptotic analysis with unbiased gradient estimates 
Ghadimi S, Lan G (2013) Stochastic first and zerothorder methods for nonconvex stochastic programming. SIAM J. Optim. 23:2341–2368 
Miniquiz 3  
Lecture 29  Stochastic gradient algorithm: Nonasymptotic analysis with biased gradient estimates 
Bhavsar N, Prashanth LA (2022) Nonasymptotic bounds for stochastic optimization with biased noisy gradient oracles. IEEE Transactions on Automatic Control, 68:16281641 
Lecture 30  Stochastic gradient algorithm: Nonasymptotic analysis in the strongly convex case 
Bottou L, Curtis FE, Nocedal J (2018) Optimization methods for largescale machine learning. Siam Review 60(2):223–311 
Lecture 31  Variance reduction: SAG, SVRG  Gower, R. M., Schmidt, M., Bach, F., & Richtarik, P. (2020). Variancereduced methods for machine learning. Proceedings of the IEEE, 108(11), 19681983 