Clustering is grouping collections of unlabeled data into a number of clusters based on similarity of data items.

# Clustering Methods

• Centroid-based Clustering: Is a non-hierarchical clustering method where centroids for a specific number of clusters is defined and distance to it is used to group data items.
• Algorithms
• Distribution-based Clustering: This method is used in data which is composed of distributions; where the distance from the distribution's center indicates the probability of item belonging to the distribution.
• Algorithms
• Density Method: It identifies and groups data points in areas of high concentrations together, assuming that they have more similarities and differences than points in a lower dense region.
This method can take advantage of Kernel Density Estimation(KDE), also called Probability Density Function(PDF), to estimate the underlying distribution of data.
• ✔️ this method has a good accuracy
• ✔️ It has the ability to merge clusters
• ✔️ Creates arbitrary-shaped distributions for dense areas
• ✔️ It’s able to find outliers.
• ❌ Is weak high dimensional data
• Algorithms
• Hierarchical Method: It forms first clusters in a tree-type structure, then creating new clusters from previously formed clusters.
• Algorithms
• BIRCH
• CURE
• Agglomerative Hierarchy clustering algorithm
• Partitioning Method: It partitions the objects into k clusters and each partition forms one cluster.
• Algorithms
• CLARANS
• Grid-based Method: formulates the data into a finite number of cells that form a grid-like structure.
• Algorithms
• CLIQUE
• STING
• Graph-Based Methods: Utilizes graph Theory and treats data items as nodes and their connections in their edges as a measure of similarity.
• Algorithms
• Spectral Clustering

Resources: