Gaussian mixture model (GMM)

Gaussian mixture model (GMM) is a Clustering algorithm using probability density estimation for datasets where data is composed of a mixture of several Gaussian distributions and is similar to K-Means algorithm with the main difference being that it accounts for variance.


Advantages:

  • It accounts for variance, unlike K-Means.
  • It can provide probability for each data point’s membership in the clusters.
  • It can identify overlapping clusters.

Disadvantages:

  • It requires the number of clusters or mixture components to be defined.
  • The covariance type must be defined.
  • It’s only useful when distribution type is known and compatible(Mixture of Gaussian distributions with different means and variances).
  • It’s computationally expensive.
  • Requires large amount of data for estimating number of clusters.