Course description: The aim of this course is to introduce the essential tools of unsupervised learning and dimensional reduction. These tools are of increasing use in preprocessing large databases to obtain human-readable information. We will present the most relevant dimensionality reduction algorithms for linear data manifolds, curved manifolds, and manifolds with arbitrarily complex topologies. We will then introduce a selection of approaches for estimating the probability density and the intrinsic dimension of the data manifold. Finally, we will introduce unsupervised classification and clustering. We briefly touch upon the mathematical and algorithmic foundations of the methods, highlighting their strengths and limitations. The self-directed solution of data analysis exercises is an essential part of the course.
Syllabus:
1. Introduction: choosing the features and the metric
2. Lab 1
3. Dimensional reduction and manifold learning
a. Linear methods: principal component analysis and multidimensional scaling
b. Curved manifolds: ISOMAP, kernel PCA and Sketchmap
c. Lab 2
d. Diffusion Map and Stochastic Neighbor Embedding
e. Characterizing the embedding manifold: the intrinsic dimension
f. Lab 3
4. Estimating the probability density
a. Parametric density estimators
b. Non-parametric estimators: Histograms, Kernel density estimator and k-nearest neighbor estimator
c. Adaptive density estimators d. Lab 4
5. Clustering
a. Partitioning schemes: k-means, k-medoids and k-centers.
b. hierarchical and spectral clustering
c. Lab 5
d. Density-based clustering
e. Clustering techniques exploiting kinetic information Lab 6