Outliers are data points(numerical) which have significant differences with other data points. They differ from majority of points in the distribution. Such points may cause the central measures of distribution, like mean, and median. So, they need to be detected and removed.


Outlier Detection is an important tasks in the field of Machine Learning and Statistics. Furthermore Outlier Handling is a part of Data Preparation process.

There are two types of outliers:

  • True outliers: Outliers representing natural variations in the sample.
  • Other outliers: Outliers not representing natural values. They may be caused by:
    • Measurement errors
    • Data entry or processing errors
    • Sampling Bias

True Outliers may be Rare Values containing valuable information and should