CS5691: Pattern recognition and Machine learning
Course Content
Bayes decision theory and Bayes classifer
Maximum likelihood and Bayesian parameter estimation
Nonparametric density estimation
Linear models
Linear leastsquares regression, logistic regression, regularized least squares, biasvariance tradeoff, Perceptron
Statistical learning theory
Support vector machines and kernel methods
Ensemble Methods
Multilayer neural networks
Mixture densities and EM algorithm
Clustering
Dimensionality reduction
Important Dates

On 
Quiz 1 
Feb 1 
Quiz 2 
Feb 22 
Midsem 
Mar 8 
Quiz 3 
Mar 22 
Quiz 4 
Apr 12 

Available on 
Submission by

Assignment 1 
Feb 8 
Mar 1 
Assignment 2 
Mar 15 
Apr 12 
Programming Contest 

Apr 26 
Textbooks
Richard O. Duda, Peter E. Hart and David G. Stork, Pattern Classification, John Wiley, 2001
Christopher M. Bishop, Pattern Recognition and Machine Learning, Springer, 2006
Additional References
Shai ShalevShwartz and Shai BenDavid, Understanding Machine Learning, Cambridge Univ. Press, 2014
Gareth James, Daniela Witten, Trevor Hastie and Robert Tibshirani. An Introduction to Statistical Learning with Applications in R, Springer, 2014.
Schedule
Lecture number 
Topics Covered 
Reference

Lecture 1 
Review of probability: Conditional probability, Bayes theorem 
[GS] Sec 1.4, 1.5, 3.1, 3.2 
Lecture 2 
Review of probability: Joint distribution, Conditional distribution and expectation 
[GS] Sec 3.6. 3.7 
Lecture 3 
Review of probability: Bivariate normal distribution 
[GS] 4.5,4.6 
Lecture 4 
Review of probability: Change of variable, bivariate normal as an example 
[GS] 4.7 
Lecture 5 
Linear algebra: review 
[St] 
Lecture 6 
Multivariate normal distribution 
[GS] 4.9 
Lecture 7 
Introduction to classification and regression, Bayes classifer 
[DHS] Chapter 2 
Lecture 8 
Bayes classifier, optimality in the twoclass case 
[DHS] Chapter 2 
Lecture 9 
Adaptation of Bayes classifier to (i) handle loss functions, (ii) multiclass case 
[DHS] Chapter 2 
Lecture 10 
Bayes classifier: case when class conditional densities are normal brief introduction to discriminant functions 
[DHS] Chapter 2 
Lecture 11 
Paramteric density estimation, meansquare error 
[DHS] Chapter 3 
Lecture 12 
Maximum likelihood (ML) estimation: introduction, and a discrete example 
[DHS] Chapter 3 
Lecture 13 
ML estimation: Gaussian case 
[DHS] Chapter 3 
Lecture 14 
Bayesian parameter estimation: Bernoulli case 
[DHS] Chapter 3 
Lecture 15 
Bayesian parameter estimation: Multinomial and Gaussian cases 
[DHS] Chapter 3 
Lecture 16 
Linear Regression: introduction 

Lecture 17 
Linear Regression: A geometric viewpoint 
[Strang] Chapter on orthogonality 
Lecture 18 
Maximum likelihood and sum of squares 
[B] Section 3.1.1 
Lecture 19 
Bias variance tradeoff 
[B] Sec 3.2, and Borkar's article here 
Lecture 20 
Bias variance tradeoff 
[B] Sec 3.2 
Lecture 21 
Polynomial regression and regularization 
[B] Sec 1.1 
Lecture 22 
A crash course on unconstrained optimization: Firstorder optimality conditions 
[BT] Sec 3.2 
Lecture 23 
A tour of convexity 

Lecture 24 
Gradient descent, LMS algorithm for least squares 
[Si] Chapter 3 
Lecture 25 
Perceptron algorithm 
[Si] Chapter 1 
Lecture 26 
Convergence analysis of Perceptron 
[Si] Chapter 1 
Lecture 27 
Linear models for classification:
Probabilistic generative models 
[B] Sec 4.2 
Lecture 28 
Probabilistic discriminative models: Logistic regression 
[B] Sec 4.3 
Lecture 29 
Iterative reweighted least squares 
[B] Sec 4.3 
Lecture 30 
Introduction to SVMs 
[Sa] Lec 31 
Lecture 31 
SVM: Linearly separable case 
[Sa] Lec 31, 32 
Lecture 32 
SVM: Nonlinearly separable case 
[SS] Chapter 7 
Lecture 33 
The Kernel trick 
[Sa] Lec 33 
Lecture 34 
Kernel regression 
[Sa] Lec 34 
Lecture 35 
Neural networks: Introduction 
[Sa] Lec 25 
Lecture 36 
Backpropagation 
[Sa] Lec 26 
Lecture 37 
Gaussian mixture models and ML estimation 
[B] Sec 9.2 
Lecture 38 
EM algorithm 
[B] Sec 9.3 
Lecture 39 
EM algorithm Kmeans clustering 
[B] Sec 9.1 
Lecture 40 
Connection between Kmeans and EM
Introduction to PCA 
[B] Sec 9.3 
Lecture 41 
PCA: Minimum error formulation 
[B] Sec 12.1 
Lecture 42 
PCA: Maximum variance formulation SVD 
[B] Sec 12.1 
Lecture 43 
Decision trees 
[B] Sec 14.4 
Lecture 44 
Decision trees 
[B] Sec 14.4 
Lecture 45 
PAClearning 
[KV] Chapter 1 
Lecture 46 
Empirical risk minimization: Consistency 
[Sa] Lec 20 
Lecture 47 
Empirical risk minimization: Uniform convergence, and VCdimension 
[Sa] Lec 21 
References:
[GS] Geoffrey Grimmett and David Stirzaker, Probability and random processes
[St] Gilbert Strang, Linear algebra and applications
[DHS] Richard O. Duda, Peter E. Hart and David G. Stork, Pattern Classification
[B] Christopher M. Bishop, Pattern Recognition and Machine Learning
[BT] Dimitri P. Bertsekas and John N. Tsitsiklis, Neurodynamic programming
[Si] Simon Haykin. Neural networks and learning machines
[SS] Bernhard Scholkopf, Alexander J. Smola, Learning with Kernels
[KV] Michael Kearns and Umesh Vazirani, An Introduction to Computational Learning Theory
[Sa] P S Sastry, Lectures on pattern recognition
