Overfitting

Overfitting (High Variance) happens when model does well on the training data(statistical model fits exactly against its training data) but doesn’t work well when supplied with new data. It is a modeling error when a function is too closely fits a limited set of data points.

Overfitting is caused by using a model which is too complex for the dataset and is trained with few training samples.

Tldr

Overfitting means that your model is too complex and captures the noise and outliers in the training data, but fails to generalize to new data.


Solutions to overfitting:

  • Use simpler models or simplify the current model
  • Find more training data
  • Stop the training early (also know as early stopping)
  • Reduce the number of features to counter complexity
    • Use different regularization techniques like dropout(in neural networks)
  • In decision trees, strict rules show overfitting and often pruning is used.
  • Regularization