CS5691: Pattern recognition and Machine learning
Course information
When: Jan-May 2019
Lectures: Slot E
Where: CS26
Teaching Assistants: Ajay Kumar Pandey, Anubha Pandey, Subhrajit Makur, Omji Gupta, Tapdiya Aarti Shrikishan, Nirav Narharibhai Bhavsar
Course Content
Bayes decision theory and Bayes classifer
Maximum likelihood and Bayesian parameter estimation
Non-parametric density estimation
Linear models
Linear least-squares regression, logistic regression, regularized least squares, bias-variance tradeoff, Perceptron
Statistical learning theory
PAC learning, empirical risk minimization, uniform convergence and VC-dimension
Support vector machines and kernel methods
Ensemble Methods
Bagging, Boosting
Multilayer neural networks
Feedforward networks, backpropagation
Mixture densities and EM algorithm
Clustering
K-means, spectral
Dimensionality reduction
Singular value decomposition, principal component analysis
Grading
Mid-term: 20%
Final exam: 30%
Quizzes: 15% (Best 3 out of 4)
Programming Assignments: 20 (10 for each assignment)
Programming Contest: 15%
Important Dates
On | |
---|---|
Quiz 1 | Feb 1 |
Quiz 2 | Feb 22 |
Midsem | Mar 8 |
Quiz 3 | Mar 22 |
Quiz 4 | Apr 12 |
Available on | Submission by | |
---|---|---|
Assignment 1 | Feb 8 | Mar 1 |
Assignment 2 | Mar 15 | Apr 12 |
Programming Contest | Apr 26 |
Textbooks
Richard O. Duda, Peter E. Hart and David G. Stork, Pattern Classification, John Wiley, 2001
Christopher M. Bishop, Pattern Recognition and Machine Learning, Springer, 2006
Additional References
Shai Shalev-Shwartz and Shai Ben-David, Understanding Machine Learning, Cambridge Univ. Press, 2014
Gareth James, Daniela Witten, Trevor Hastie and Robert Tibshirani. An Introduction to Statistical Learning with Applications in R, Springer, 2014.
Quizzes and Exams
Schedule
Lecture number | Topics Covered | Reference |
---|---|---|
Lecture 1 | Review of probability: Conditional probability, Bayes theorem | [GS] Sec 1.4, 1.5, 3.1, 3.2 |
Lecture 2 | Review of probability: Joint distribution, Conditional distribution and expectation | [GS] Sec 3.6. 3.7 |
Lecture 3 | Review of probability: Bivariate normal distribution | [GS] 4.5,4.6 |
Lecture 4 | Review of probability: Change of variable, bivariate normal as an example | [GS] 4.7 |
Lecture 5 | Linear algebra: review | [St] |
Lecture 6 | Multivariate normal distribution | [GS] 4.9 |
Lecture 7 | Introduction to classification and regression, Bayes classifer | [DHS] Chapter 2 |
Lecture 8 | Bayes classifier, optimality in the two-class case | [DHS] Chapter 2 |
Lecture 9 | Adaptation of Bayes classifier to (i) handle loss functions, (ii) multi-class case | [DHS] Chapter 2 |
Lecture 10 | Bayes classifier: case when class conditional densities are normalbrief introduction to discriminant functions | [DHS] Chapter 2 |
Lecture 11 | Paramteric density estimation, mean-square error | [DHS] Chapter 3 |
Lecture 12 | Maximum likelihood (ML) estimation: introduction, and a discrete example | [DHS] Chapter 3 |
Lecture 13 | ML estimation: Gaussian case | [DHS] Chapter 3 |
Lecture 14 | Bayesian parameter estimation: Bernoulli case | [DHS] Chapter 3 |
Lecture 15 | Bayesian parameter estimation: Multinomial and Gaussian cases | [DHS] Chapter 3 |
Lecture 16 | Linear Regression: introduction | |
Lecture 17 | Linear Regression: A geometric viewpoint | [Strang] Chapter on orthogonality |
Lecture 18 | Maximum likelihood and sum of squares | [B] Section 3.1.1 |
Lecture 19 | Bias variance tradeoff | [B] Sec 3.2, and Borkar's article here |
Lecture 20 | Bias variance tradeoff | [B] Sec 3.2 |
Lecture 21 | Polynomial regression and regularization | [B] Sec 1.1 |
Lecture 22 | A crash course on unconstrained optimization:First-order optimality conditions | [BT] Sec 3.2 |
Lecture 23 | A tour of convexity | |
Lecture 24 | Gradient descent, LMS algorithm for least squares | [Si] Chapter 3 |
Lecture 25 | Perceptron algorithm | [Si] Chapter 1 |
Lecture 26 | Convergence analysis of Perceptron | [Si] Chapter 1 |
Lecture 27 | Linear models for classification: Probabilistic generative models | [B] Sec 4.2 |
Lecture 28 | Probabilistic discriminative models:Logistic regression | [B] Sec 4.3 |
Lecture 29 | Iterative re-weighted least squares | [B] Sec 4.3 |
Lecture 30 | Introduction to SVMs | [Sa] Lec 31 |
Lecture 31 | SVM: Linearly separable case | [Sa] Lec 31, 32 |
Lecture 32 | SVM: Non-linearly separable case | [SS] Chapter 7 |
Lecture 33 | The Kernel trick | [Sa] Lec 33 |
Lecture 34 | Kernel regression | [Sa] Lec 34 |
Lecture 35 | Neural networks: Introduction | [Sa] Lec 25 |
Lecture 36 | Backpropagation | [Sa] Lec 26 |
Lecture 37 | Gaussian mixture models and ML estimation | [B] Sec 9.2 |
Lecture 38 | EM algorithm | [B] Sec 9.3 |
Lecture 39 | EM algorithmK-means clustering | [B] Sec 9.1 |
Lecture 40 | Connection between K-means and EM Introduction to PCA | [B] Sec 9.3 |
Lecture 41 | PCA: Minimum error formulation | [B] Sec 12.1 |
Lecture 42 | PCA: Maximum variance formulation SVD | [B] Sec 12.1 |
Lecture 43 | Decision trees | [B] Sec 14.4 |
Lecture 44 | Decision trees | [B] Sec 14.4 |
Lecture 45 | PAC-learning | [KV] Chapter 1 |
Lecture 46 | Empirical risk minimization: Consistency | [Sa] Lec 20 |
Lecture 47 | Empirical risk minimization: Uniform convergence, and VC-dimension | [Sa] Lec 21 |
References:
[GS] Geoffrey Grimmett and David Stirzaker, Probability and random processes
[St] Gilbert Strang, Linear algebra and applications
[DHS] Richard O. Duda, Peter E. Hart and David G. Stork, Pattern Classification
[B] Christopher M. Bishop, Pattern Recognition and Machine Learning
[BT] Dimitri P. Bertsekas and John N. Tsitsiklis, Neuro-dynamic programming
[Si] Simon Haykin. Neural networks and learning machines
[SS] Bernhard Scholkopf, Alexander J. Smola, Learning with Kernels
[KV] Michael Kearns and Umesh Vazirani, An Introduction to Computational Learning Theory
[Sa] P S Sastry, Lectures on pattern recognition