k-Nearest Neighbors (KNN)

KNN is a Clustering algorithm. By assuming that the data points exiting in close proximity to one another are highly similar, KNN uses Statistical knowledge to group new data points based on their distance from the nearest points.


Algorithm:

  1. Figure out an appropriate distance metric to calculate the distance between the data points.
  2. Store the distance in an array and sort it according to the ascending order of their distances.
  3. Select the first elements in the sorted list.
  4. Perform the majority Voting and the class with the maximum number of occurrences will be assigned as the new class for the data point to be classified.
Info

KNN often uses of Euclidean distance to find the closest classified points. Other measurements include:

  • Manhattan Method
  • Minkowski Method
  • Mahalanobis distance

Advantages:

  • Simple and easy to implement
  • Works well with a small datasets
  • KNN can be used to solve Classification and Regression problems.
  • KNN is suitable for multiclass classification.
  • KNN is a lazy algorithm, meaning it doesn’t require training.

Disadvantages:

  • Computationally expensive
  • Sensitive to the choice of number of clusters ()
  • Not suitable for high-dimensional data

Applications: