NOT DONE
Comparison of the K-means algorithm with the EM algorithm for Gaussian mixtures shows that there is a close similarity. Whereas the K-means algorithm performs a hard assignment of data points to clusters, in which each data point is associated uniquely with one cluster, the EM algorithm makes a soft assignment based on the posterior probabilities. In fact, we can derive the K-means algorithm as a particular limit of EM for Gaussian mixtures as follows.
The goal of the EM algorithm is to find maximum likelihood solutions for models having latent variables
$$ p(X|\theta) = \sum_Zp(X,Z|\theta) $$
So the log likelihood function is given by
$$ L(\theta) = ln(p(X|\theta)) = ln\bigg\{ \sum_Zp(X,Z|\theta)\bigg\} $$
K-means can be viewed as the limiting case of Expectation-Maximization for spherical Gaussian Mixture Models as the trace of the covariance matrices goes to zero.
GMMs are probabilistic models that assume all the data points are generated from a mixture of several Gaussian distributions with unknown parameters. They differ from k-means clustering in that GMMs incorporate information about the center(mean) and variability(variance) of each clusters and provide posterior probabilities.
The Gaussian mixture distribution can be written as a linear superposition of Gaussians in the form
Such superpositions, formed by taking linear combinations of more basic distributions such as Gaussians, can be formulated as probabilistic models known as mixture distributions (McLachlan and Basford, 1988; McLachlan and Peel, 2000)
Consider superpositions of K Gaussian densities of the form