Overfitting

Overfitting (High Variance) is a Fitting Problem that happens when model does well on the training data(statistical model fits exactly against its training data) but doesn’t work well when supplied with new data(evaluation data). It is a modeling error when a function is too closely fits a limited set of data points.

Overfitting is caused by using a model which is too complex for the dataset and is trained with few training samples.

Tldr

Overfitting means that your model is too complex and captures the noise and outliers in the training data, but fails to generalize to new data.


Solutions to overfitting:

  • Use simpler models or simplify the current model
  • Find more training data
  • Stop the training early (also know as early stopping)
  • Reduce the number of Features to counter complexity
  • In Decision Trees, strict rules show overfitting and often Pruning is used.
  • Regularization techniques like Dropout(in neural networks)