CS5691: Pattern recognition and Machine learning

Course information

Course Content

  • Bayes decision theory and Bayes classifer

  • Maximum likelihood and Bayesian parameter estimation

  • Non-parametric density estimation

  • Linear models

    • Linear least-squares regression, logistic regression, regularized least squares, bias-variance tradeoff, Perceptron

  • Statistical learning theory

    • PAC learning, empirical risk minimization, uniform convergence and VC-dimension

  • Support vector machines and kernel methods

  • Ensemble Methods

    • Bagging, Boosting

  • Multilayer neural networks

    • Feedforward networks, backpropagation

  • Mixture densities and EM algorithm

  • Clustering

    • K-means, spectral

  • Dimensionality reduction

    • Singular value decomposition, principal component analysis

Grading

  • Mid-term: 20%

  • Final exam: 30%

  • Quizzes: 15% (Best 3 out of 4)

  • Programming Assignments: 20 (10 for each assignment)

  • Programming Contest: 15%

Important Dates

On
Quiz 1 Feb 1
Quiz 2 Feb 22
Midsem Mar 8
Quiz 3 Mar 22
Quiz 4 Apr 12
Available on Submission by
Assignment 1 Feb 8 Mar 1
Assignment 2 Mar 15 Apr 12
Programming Contest Apr 26

Textbooks

  • Richard O. Duda, Peter E. Hart and David G. Stork, Pattern Classification, John Wiley, 2001

  • Christopher M. Bishop, Pattern Recognition and Machine Learning, Springer, 2006

Additional References

  • Shai Shalev-Shwartz and Shai Ben-David, Understanding Machine Learning, Cambridge Univ. Press, 2014

  • Gareth James, Daniela Witten, Trevor Hastie and Robert Tibshirani. An Introduction to Statistical Learning with Applications in R, Springer, 2014.

Schedule

Lecture number Topics Covered Reference
Lecture 1 Review of probability: Conditional probability, Bayes theorem [GS] Sec 1.4, 1.5, 3.1, 3.2
Lecture 2 Review of probability: Joint distribution, Conditional distribution and expectation [GS] Sec 3.6. 3.7
Lecture 3 Review of probability: Bivariate normal distribution [GS] 4.5,4.6
Lecture 4 Review of probability: Change of variable, bivariate normal as an example [GS] 4.7
Lecture 5 Linear algebra: review [St]
Lecture 6 Multivariate normal distribution [GS] 4.9
Lecture 7 Introduction to classification and regression, Bayes classifer [DHS] Chapter 2
Lecture 8 Bayes classifier, optimality in the two-class case [DHS] Chapter 2
Lecture 9 Adaptation of Bayes classifier to (i) handle loss functions, (ii) multi-class case [DHS] Chapter 2
Lecture 10 Bayes classifier: case when class conditional densities are normal
brief introduction to discriminant functions
[DHS] Chapter 2
Lecture 11 Paramteric density estimation, mean-square error [DHS] Chapter 3
Lecture 12 Maximum likelihood (ML) estimation: introduction, and a discrete example [DHS] Chapter 3
Lecture 13 ML estimation: Gaussian case [DHS] Chapter 3
Lecture 14 Bayesian parameter estimation: Bernoulli case [DHS] Chapter 3
Lecture 15 Bayesian parameter estimation: Multinomial and Gaussian cases [DHS] Chapter 3
Lecture 16 Linear Regression: introduction
Lecture 17 Linear Regression: A geometric viewpoint [Strang] Chapter on orthogonality
Lecture 18 Maximum likelihood and sum of squares [B] Section 3.1.1
Lecture 19 Bias variance tradeoff [B] Sec 3.2, and Borkar's article here
Lecture 20 Bias variance tradeoff [B] Sec 3.2
Lecture 21 Polynomial regression and regularization [B] Sec 1.1
Lecture 22 A crash course on unconstrained optimization:
First-order optimality conditions
[BT] Sec 3.2
Lecture 23 A tour of convexity
Lecture 24 Gradient descent, LMS algorithm for least squares [Si] Chapter 3
Lecture 25 Perceptron algorithm [Si] Chapter 1
Lecture 26 Convergence analysis of Perceptron [Si] Chapter 1
Lecture 27 Linear models for classification:
Probabilistic generative models
[B] Sec 4.2
Lecture 28 Probabilistic discriminative models:
Logistic regression
[B] Sec 4.3
Lecture 29 Iterative re-weighted least squares [B] Sec 4.3
Lecture 30 Introduction to SVMs [Sa] Lec 31
Lecture 31 SVM: Linearly separable case [Sa] Lec 31, 32
Lecture 32 SVM: Non-linearly separable case [SS] Chapter 7
Lecture 33 The Kernel trick [Sa] Lec 33
Lecture 34 Kernel regression [Sa] Lec 34
Lecture 35 Neural networks: Introduction [Sa] Lec 25
Lecture 36 Backpropagation [Sa] Lec 26
Lecture 37 Gaussian mixture models and ML estimation [B] Sec 9.2
Lecture 38 EM algorithm [B] Sec 9.3
Lecture 39 EM algorithm
K-means clustering
[B] Sec 9.1
Lecture 40 Connection between K-means and EM
Introduction to PCA
[B] Sec 9.3
Lecture 41 PCA: Minimum error formulation [B] Sec 12.1
Lecture 42 PCA: Maximum variance formulation
SVD
[B] Sec 12.1
Lecture 43 Decision trees [B] Sec 14.4
Lecture 44 Decision trees [B] Sec 14.4
Lecture 45 PAC-learning [KV] Chapter 1
Lecture 46 Empirical risk minimization: Consistency [Sa] Lec 20
Lecture 47 Empirical risk minimization: Uniform convergence, and VC-dimension [Sa] Lec 21

References:
[GS] Geoffrey Grimmett and David Stirzaker, Probability and random processes
[St] Gilbert Strang, Linear algebra and applications
[DHS] Richard O. Duda, Peter E. Hart and David G. Stork, Pattern Classification
[B] Christopher M. Bishop, Pattern Recognition and Machine Learning
[BT] Dimitri P. Bertsekas and John N. Tsitsiklis, Neuro-dynamic programming
[Si] Simon Haykin. Neural networks and learning machines
[SS] Bernhard Scholkopf, Alexander J. Smola, Learning with Kernels
[KV] Michael Kearns and Umesh Vazirani, An Introduction to Computational Learning Theory
[Sa] P S Sastry, Lectures on pattern recognition