ATSML HT 2018

	SC4/SM8 Advanced Topics in Statistical Machine Learning
Term:	Hilary Term 2018, Jan 15 - Mar 9
Lectures:	Tue 2pm, Thu 4pm
Part C Class Tutor:	Leonard Hasenclever
Part C Teaching Assistant:	Sam Davenport
Part C Classes:	Fri 1:30-3 and 3-4:30 in weeks 3,5,7,8, LG.04
Part C Problem Sheet Deadlines:	Mon 10 in weeks 3,5,7,8
MSc Classes:	Mon 11-12, weeks 3,5,7,8, LG.01
Part C Revision Classes:	TT week 2: Thu 3pm (SC4 2017 exam), LG.01
	TT week 4: Mon 3pm (SC4 2016 Q1c,Q2bc; SM4 2017 Q1bc,Q2), LG.01

Course Materials

The course materials will consist of notes, slides, and Jupyter notebooks. Notes are not exhaustive and should be used in conjunction with the slides. All materials may be updated during the course and are thus best read on screen. Please email me any typos or corrections.

If you are taking / have taken this course, please fill in the feedback form.

Lecture Notes

pdf - last updated on 10 Feb 2018 (corrections to Section 10).

Slides

Unsupervised Learning Basics
Supervised Learning Basics – fixed the typos on slide 12
Support Vector Machines
Kernel Methods
Spectral Clustering
Latent Variable Models and EM
Collaborative Filtering
Bayesian Learning
Variational Bayes
Gaussian Processes
Bayesian Optimisation

Notebooks

PCA. crabs ipynb, html; eigenfaces ipynb, html
KRR ipynb, html
Spectral Clustering ipynb, html
Mixtures ipynb, html
Collaborative Filtering. movielens ipynb, html; parliament ipynb, html

Problem Sheets

Sheet 1, due 29th Jan
Sheet 2, due 12th Feb
Sheet 3, due 26th Feb
Sheet 4, due 5th Mar

Aims and objectives:

Machine learning is widely used across many scientific and engineering disciplines to construct methods for finding interesting patterns in large datasets, devising complex models and prediction tools. This course introduces several widely used machine learning techniques and describes their underpinning statistical principles and properties. The course studies both unsupervised and supervised learning and several advanced topics are covered in detail, including some state-of-the-art machine learning techniques. The course will also cover computational considerations of machine learning algorithms and how they can scale to large datasets.

Prerequisites:

A8 Probability and A9 Statistics.
Some material from this year’s syllabus of SB2b Statistical Machine Learning, PCA and the basics of clustering, will be used (which is mainly taught in the first three lectures of SB2b, also in HT2018), but SB2b is not a prerequisite and background notes will be provided.

Synopsis:

Unsupervised and Supervised learning basics.
Loss functions. Empirical risk minimization.
Convex optimization and support vector machines.
Kernel methods and reproducing kernel Hilbert spaces. Representer theorem. Representation of probabilities in RKHS.
Nonlinear dimensionality reduction: kernel PCA, spectral clustering.
Probabilistic and Bayesian machine learning: mixture modelling, information theoretic fundamentals, EM algorithm, Probabilistic PCA.
Laplace Approximation. Variational Bayes. Topic Modelling.
Collaborative filtering models, probabilistic matrix factorization.
Gaussian processes for regression and classification. Bayesian optimization.

Textbooks and Background Reading

Bishop, Pattern Recognition and Machine Learning, Springer.
Murphy, Machine Learning: A Probabilistic Perspective, MIT Press.
Hastie, Tibshirani and Friedman, The Elements of Statistical Learning, Springer. ebook
Shalev-Shwartz and Ben-David, Understanding Machine Learning: From Theory to Algorithms, Cambridge University Press.

Background Review Aids:

Matrix and Gaussian identities - short useful reference for machine learning.
Linear Algebra Review and Reference- useful selection for machine learning.
Video reviews on Linear Algebra by Zico Kolter
Video reviews on Multivariate Calculus and SVD by Aaditya Ramdas
The Matrix Cookbook - extensive reference.

Software

R

Python

Knowledge of Python is not required for this course, but some descriptive examples in lectures may be done in Python. Students interested in further Python training are referred to the free University IT online courses.