Course lecturer: Prof Dino Sejdinovic. The course and all course materials were designed by Prof Jonathan Marchini.
The course synopsis
The aim of the Data Analysis course
is to introduce students to the theory and practice of
unsupervised learning.
Unsupervised learning can be described as finding structure in
datasets, and has applications in many areas such as finance,
retail, medical imaging, sports performance analysis, genetics,
medicine, studies of the environment and social networks.
Unsupervised learning methods are important parts of
Computational Statistics,
Machine Learning,
Artificial Intelligence, and
Big Data.
Raw
dataset : 300 x 8686 matrix of gene expression
measurements from
Pollen et al (2014) Nature Biotechnology 32, 1053-1058 Viewing the raw data it is very difficult to see any clear structure or similarity between the samples. |
3D
Projection and clustering : The method of Principal
Components Analysis (PCA) has been applied to the dataset
in order to uncover structure. A clustering method
(k-means) has then been applied to group observations in
distinct groupings or clusters. Students will learn the theory and practical skills to reproduce this analysis. |
This course leads onto
several more advanced courses in future years that students
should consider if they wish to learn more about Statistical
Data Analysis, Machine Learning, Big Data and Artificial
Intelligence.