Gradient Boosting (GB)

Gradient Boosting is the process of relying on previous models to improve the next model, and minimizing the overall prediction error.

Gradient Boosting builds multiple decision trees where each tree learns from the mistakes of previous trees. It uses residual error to improve the prediction performance. The whole aim of Gradient boosting is to reduce the residual error as much as possible.

Gradient Boosting is similar to AdaBoost (Adaptive Boosting), the difference between the two is that ADA Boosts builds Decision Stumps whereas Gradient boosting builds decision trees with multiple leaves.

Notes:

  • Early stopping: To prevent Overfitting and improve performance, early stopping can be used to stop the training process when the validation error stops improving.
  • Regularization Can be used to prevent Overfitting and improve generalization.
  • It's difficult to interpret the model.

Main parameters:

  • Learning Rate
  • N-Estimators: The number of trees.
  • Max depth
  • Min samples split
  • Min samples leaf
  • Sub-sample

Advantages:

Disadvantages:

  • As it's very flexible in Hyper-Parameter Tuning, the techniques used for it can be computationally heavy.
  • Predictions are not easy to understand and interpret. Although there are tools to help with that.
  • Computationally expensive.
  • Sensitive to outliers.
  • A small change in training set or features can create radically different models.