DBSCAN

Density-Based Spatial Clustering Application with Noise(DBSCAN) is a density-based clustering algorithm, which means density of items is used to assign items to clusters rather than a centroid or single point.

It requires two parameters:

  • minPts: the minimum number of data points that need to be clustered together for an area to be considered high-density.
  • eps: the distance used to determine if a data point is in the same area as other data points.

Pros and cons

  • DBSCAN is good at handling outliers
  • it can create arbitrarily shaped clusters
  • it’s good at handling oddly shaped data
  • It’s poor at handling lower density data, OPTICS is suggested as an alternative
  • it require fine-tuning initial parameters to work well