Both k-means and EM algorithm belongs to the category of unsupervised learning. I have explained in this post what is unsupervised learning.
The k-means algorithm (aka Lloyd’s algorithm) is simple and straightforward. Given you have a dataset X, the basically idea is to alternating between the following two steps until you minimize the objective:
Step 1: Given K points that represents the center of K clusters (thus K classes), assign each data point to the class that has the center point closest to it.
Step 2: Now you have a set of data points for each class. So you calculate the center/mean for each class base on the data points belong to that class.
where the objective can be defined as the sum of squared distance between each data point and its closest cluster center.
Following the algorithm above, the squared distance will keep decreasing until it converges to a local minima.
(A simple example code can be found here.)
“The expectation maximization algorithm, or EM algorithm, is a general technique for finding maximum likelihood solutions for probabilistic models having latent variables.” (Chris Bishop, Patter Recognition and Machine Learning)
EM (expectation maximization) is a generalization of k-means algorithm. In EM we introduce the notion of hidden variables or latent variables which are used to represent variables that we do not observe. Usually in clustering problem, we use hidden variables to represent the class assignments.
Similar to k-means, EM alternating between two steps until convergence:
E-step: calculate the expectation of the hidden variables
M-step: maximal likelihood estimation of the parameters of the assumed/given models
To understand EM and its convergence requires some statistics and calculus. Thus, it is helpful to look at one of most common uses of EM, i.e. its usage in Gaussian Mixture Model.