SC4/SM8 Advanced Topics in Statistical Machine Learning | |
---|---|
Term: | Hilary Term 2019, Jan 14 - Mar 8 |
Lectures: | weeks 1-6,8: Tue 3pm & Thu 4pm, LG.01; week 7: Thu 4pm & Fri 4pm, LG.01 (note the time change in week 7) |
Part C / OMMS Classes | Class Sign-Up (via Weblearn) |
Set 1: Tue 9am (3,5,7,TT1), LG.05 | class tutor: Jean-Francois Ton, TA: Charline Le Lan |
Set 2: Tue 10.30am (3,5,7,TT1), LG.05 | class tutor: Jean-Francois Ton, TA: Charline Le Lan |
Set 3: Tue 9am (3,5,7,TT1), LG.04 | class tutor: Tomas Vaskevicius, TA: Tomas Vaskevicius |
Part C Problem Sheet Deadlines: | Fri noon, weeks 2,4,6 (Sheets 1-3, no hand-in for Sheet 4) |
MSc Classes: | Fri 10am (3,5,7,TT1), LG.01 |
Part C Revision Classes: | Tue 2pm, TT week 3, SC4 2017 exam |
Tue 2pm, TT week 5, SC4 2018 exam | |
Part C Consultation Sessions (with class tutors): | Mon 10am, TT week 3, LG.05 |
Mon 10am, TT week 5, LG.05 |
Announcements
- Part C revision classes will be at Tue 2pm in weeks 3 and 5 of Trinity Term.
- There will be no lecture at Tue 3pm in week 7. Lectures will be held at Thu 4pm (Feb 28th) and Fri 4pm (Mar 1st) that week.
- Solutions to problem sheets 1-3 will be made available on the course website by the end of HT week 8. Solutions to problem sheet 4 will be made available during TT week 0.
Course Materials
The course materials will appear here before the course starts. They consist of notes, slides, and Jupyter notebooks. Notes may not be exhaustive and should be used in conjunction with the slides. All materials may be updated during the course and are thus best read on screen. Please email me any typos or corrections.
Lecture Notes
- pdf - (last updated on 11/02/2019: fixed typos in Section 8.5).
Slides
- Chapter 1: Review of Fundamentals
- Chapter 2: Support Vector Machines
- Chapter 3: Kernel Methods
- Chapter 4: Similarity Graphs and Laplacians
- Chapter 5: Latent Variable Models and EM Algorithm
- Chapter 6: Collaborative Filtering
- Chapter 7: Bayesian Learning
- Chapter 8: Variational Methods
- Chapter 9: Gaussian Processes
- Chapter 10: Bayesian Optimization
Problem Sheets
- Problem Sheet 1 - due Jan 25th noon,
- Problem Sheet 2 - due Feb 8th noon,
- Problem Sheet 3 - due Feb 22nd noon,
- Problem Sheet 4 - no submissions.
Notebooks
- Kernel Ridge Regression: ipynb, html
- Spectral Clustering: ipynb, html
- Mixtures: ipynb, html
- CF Movielens: ipynb, html
- CF Parliament: ipynb, html
Aims and objectives:
Machine learning is widely used across many scientific and engineering disciplines to construct methods for finding interesting patterns in large datasets, devising complex models and prediction tools. This course introduces several widely used machine learning techniques and describes their underpinning statistical principles and properties. The course studies both unsupervised and supervised learning and several advanced topics are covered in detail, including some state-of-the-art machine learning techniques. The course will also cover computational considerations of machine learning algorithms and how they can scale to large datasets.
Prerequisites:
A8 Probability and A9 Statistics.
Some material from this year’s syllabus of SB2.2 Statistical Machine Learning, PCA and the basics of clustering, will be used (which is mainly taught in the first three lectures of SB2.2, also in HT2019), but SB2.2 is not a prerequisite and background notes will be provided.
Synopsis:
Review of unsupervised and supervised learning.
Duality in convex optimization and support vector machines.
Kernel methods and reproducing kernel Hilbert spaces. Representer theorem. Representation of probabilities in RKHS.
Kernel PCA. Spectral clustering. Manifold regularization.
Probabilistic and Bayesian machine learning: latent variable models, variational free energy, EM algorithm, mixtures, probabilistic PCA.
Laplace Approximation. Variational Bayes, Latent Dirichlet Allocation.
Collaborative filtering models, probabilistic matrix factorization.
Gaussian processes for regression and classification. Bayesian optimization.
Textbooks and Background Reading
Recommended textbooks:
- Bishop, Pattern Recognition and Machine Learning, Springer.
- Murphy, Machine Learning: A Probabilistic Perspective, MIT Press.
- Hastie, Tibshirani and Friedman, The Elements of Statistical Learning, Springer. ebook
- Shalev-Shwartz and Ben-David, Understanding Machine Learning: From Theory to Algorithms, Cambridge University Press.
Background Review Aids:
- Matrix and Gaussian identities - short useful reference for machine learning.
- Linear Algebra Review and Reference- useful selection for machine learning.
- Video reviews on Linear Algebra by Zico Kolter
- Video reviews on Multivariate Calculus and SVD by Aaditya Ramdas
- The Matrix Cookbook - extensive reference.
Software
R
Python
Knowledge of Python is not required for this course, but some descriptive examples in lectures may be done in Python. Students interested in further Python training are referred to the free University IT online courses.