SC4/SM4 Data Mining and Machine Learning | |
---|---|
Term: | Hilary Term 2017, Jan 16 - Mar 10 |
Lectures: | HT weeks 1-8: Tue 2pm, Thu 12pm, LG.01 |
MSc Classes: | HT weeks 3,5,7,8: Mon 11am, LG.01 |
MSc Practicals: | HT week 5: Fri 2-4pm; HT week 8: Fri 2-4pm (group assessed) |
Part C Class Tutors: | Jovana Mitrovic and Leonard Hasenclever |
Part C Classes: | HT weeks 3,5,7,8: Wed 2:30-4pm, Wed 5-6:30pm, Fri 4:30-6pm |
Part C Problem Sheet Deadlines: | HT weeks 3,5,7,8: Mon 10am |
Part C Revision Classes: | TT week 3: Thu 11am, week 4: Thu 4pm LG.01 |
Course Materials
The course materials will consist of slides, summary notes and Jupyter notebooks. Summary notes are not exhaustive and should be used in conjunction with the slides. All materials are frequently updated and are thus best read on screen. Please email me any typos or corrections.
- Dimensionality Reduction: slides, notes, notebook 1: ipynb html, notebook 2: ipynb html
- Clustering: slides, notes, notebook 1: ipynb html, notebook 2: ipynb html
- Latent Variable Models and EM Algorithm: slides, notes, notebook: ipynb html
- Collaborative Filtering and Biclustering: notes, notebook 1: ipynb html, notebook 2: ipynb html
- Supervised Learning Basics: slides, notes
- Kernel Methods: slides, notes, notebook: ipynb html
- Bayesian Learning: slides, notes
- Gaussian Processes: slides, notes
Revision
Much of the material was part of previous courses called Statistical Data Mining and Statistical Data Mining and Machine Learning. So there are relevant old questions.
- MSc Paper (II) questions on Statistical Data Mining/Statistical Data Mining and Machine Learning.
- Part C questions for many past years – Paper SC4 (which was called MS1b up to 2014).
- Some specific questions (to be covered in Part C revision classes in TT weeks 3,4):
- Part C 2016 Q3, Q2 (without (a-ii) )
- Part C 2015 Q2 (d), Q3 (c)
- Part C 2014 Q1 (a,c,d), Q3 (c)
- MSc 2016 Q6
MSc Practicals
- Week 5 Practical: pdf, parliament data, sample solution: ipynb html
- Week 8 Practical: pdf, kaggle challenge sample solution: html
Textbooks and Background Reading
Recommended textbooks:
- Hastie, Tibshirani and Friedman, The Elements of Statistical Learning, Springer. ebook
- Bishop, Pattern Recognition and Machine Learning, Springer.
- Murphy, Machine Learning: A Probabilistic Perspective, MIT Press.
- Shalev-Shwartz and Ben-David, Understanding Machine Learning: From Theory to Algorithms, Cambridge University Press.
Background Review Aids:
- Matrix and Gaussian identities - short useful reference for machine learning.
- Linear Algebra Review and Reference- useful selection for machine learning.
- Video reviews on Linear Algebra by Zico Kolter
- Video reviews on Multivariate Calculus and SVD by Aaditya Ramdas
- The Matrix Cookbook - extensive reference.