CS5691: Pattern recognition and Machine learning

Course information

When: Jan-May 2019
Lectures: Slot E
Where: CS26
Teaching Assistants: Ajay Kumar Pandey, Anubha Pandey, Subhrajit Makur, Omji Gupta, Tapdiya Aarti Shrikishan, Nirav Narharibhai Bhavsar

Course Content

Bayes decision theory and Bayes classifer
Maximum likelihood and Bayesian parameter estimation
Non-parametric density estimation
Linear models
- Linear least-squares regression, logistic regression, regularized least squares, bias-variance tradeoff, Perceptron
Statistical learning theory
- PAC learning, empirical risk minimization, uniform convergence and VC-dimension
Support vector machines and kernel methods
Ensemble Methods
- Bagging, Boosting
Multilayer neural networks
- Feedforward networks, backpropagation
Mixture densities and EM algorithm
Clustering
- K-means, spectral
Dimensionality reduction
- Singular value decomposition, principal component analysis

Grading

Mid-term: 20%
Final exam: 30%
Quizzes: 15% (Best 3 out of 4)
Programming Assignments: 20 (10 for each assignment)
Programming Contest: 15%

Important Dates

	On
Quiz 1	Feb 1
Quiz 2	Feb 22
Midsem	Mar 8
Quiz 3	Mar 22
Quiz 4	Apr 12

	Available on	Submission by
Assignment 1	Feb 8	Mar 1
Assignment 2	Mar 15	Apr 12
Programming Contest		Apr 26

Textbooks

Richard O. Duda, Peter E. Hart and David G. Stork, Pattern Classification, John Wiley, 2001
Christopher M. Bishop, Pattern Recognition and Machine Learning, Springer, 2006

Additional References

Shai Shalev-Shwartz and Shai Ben-David, Understanding Machine Learning, Cambridge Univ. Press, 2014
Gareth James, Daniela Witten, Trevor Hastie and Robert Tibshirani. An Introduction to Statistical Learning with Applications in R, Springer, 2014.

Quizzes and Exams

Schedule

Lecture number	Topics Covered	Reference
Lecture 1	Review of probability: Conditional probability, Bayes theorem	[GS] Sec 1.4, 1.5, 3.1, 3.2
Lecture 2	Review of probability: Joint distribution, Conditional distribution and expectation	[GS] Sec 3.6. 3.7
Lecture 3	Review of probability: Bivariate normal distribution	[GS] 4.5,4.6
Lecture 4	Review of probability: Change of variable, bivariate normal as an example	[GS] 4.7
Lecture 5	Linear algebra: review	[St]
Lecture 6	Multivariate normal distribution	[GS] 4.9
Lecture 7	Introduction to classification and regression, Bayes classifer	[DHS] Chapter 2
Lecture 8	Bayes classifier, optimality in the two-class case	[DHS] Chapter 2
Lecture 9	Adaptation of Bayes classifier to (i) handle loss functions, (ii) multi-class case	[DHS] Chapter 2
Lecture 10	Bayes classifier: case when class conditional densities are normal brief introduction to discriminant functions	[DHS] Chapter 2
Lecture 11	Paramteric density estimation, mean-square error	[DHS] Chapter 3
Lecture 12	Maximum likelihood (ML) estimation: introduction, and a discrete example	[DHS] Chapter 3
Lecture 13	ML estimation: Gaussian case	[DHS] Chapter 3
Lecture 14	Bayesian parameter estimation: Bernoulli case	[DHS] Chapter 3
Lecture 15	Bayesian parameter estimation: Multinomial and Gaussian cases	[DHS] Chapter 3
Lecture 16	Linear Regression: introduction
Lecture 17	Linear Regression: A geometric viewpoint	[Strang] Chapter on orthogonality
Lecture 18	Maximum likelihood and sum of squares	[B] Section 3.1.1
Lecture 19	Bias variance tradeoff	[B] Sec 3.2, and Borkar's article here
Lecture 20	Bias variance tradeoff	[B] Sec 3.2
Lecture 21	Polynomial regression and regularization	[B] Sec 1.1
Lecture 22	A crash course on unconstrained optimization: First-order optimality conditions	[BT] Sec 3.2
Lecture 23	A tour of convexity
Lecture 24	Gradient descent, LMS algorithm for least squares	[Si] Chapter 3
Lecture 25	Perceptron algorithm	[Si] Chapter 1
Lecture 26	Convergence analysis of Perceptron	[Si] Chapter 1
Lecture 27	Linear models for classification: Probabilistic generative models	[B] Sec 4.2
Lecture 28	Probabilistic discriminative models: Logistic regression	[B] Sec 4.3
Lecture 29	Iterative re-weighted least squares	[B] Sec 4.3
Lecture 30	Introduction to SVMs	[Sa] Lec 31
Lecture 31	SVM: Linearly separable case	[Sa] Lec 31, 32
Lecture 32	SVM: Non-linearly separable case	[SS] Chapter 7
Lecture 33	The Kernel trick	[Sa] Lec 33
Lecture 34	Kernel regression	[Sa] Lec 34
Lecture 35	Neural networks: Introduction	[Sa] Lec 25
Lecture 36	Backpropagation	[Sa] Lec 26
Lecture 37	Gaussian mixture models and ML estimation	[B] Sec 9.2
Lecture 38	EM algorithm	[B] Sec 9.3
Lecture 39	EM algorithm K-means clustering	[B] Sec 9.1
Lecture 40	Connection between K-means and EM Introduction to PCA	[B] Sec 9.3
Lecture 41	PCA: Minimum error formulation	[B] Sec 12.1
Lecture 42	PCA: Maximum variance formulation SVD	[B] Sec 12.1
Lecture 43	Decision trees	[B] Sec 14.4
Lecture 44	Decision trees	[B] Sec 14.4
Lecture 45	PAC-learning	[KV] Chapter 1
Lecture 46	Empirical risk minimization: Consistency	[Sa] Lec 20
Lecture 47	Empirical risk minimization: Uniform convergence, and VC-dimension	[Sa] Lec 21

References:
[GS] Geoffrey Grimmett and David Stirzaker, Probability and random processes
[St] Gilbert Strang, Linear algebra and applications
[DHS] Richard O. Duda, Peter E. Hart and David G. Stork, Pattern Classification
[B] Christopher M. Bishop, Pattern Recognition and Machine Learning
[BT] Dimitri P. Bertsekas and John N. Tsitsiklis, Neuro-dynamic programming
[Si] Simon Haykin. Neural networks and learning machines
[SS] Bernhard Scholkopf, Alexander J. Smola, Learning with Kernels
[KV] Michael Kearns and Umesh Vazirani, An Introduction to Computational Learning Theory
[Sa] P S Sastry, Lectures on pattern recognition