Random Forest

Random Forests builds a ‘forest’ of several Decision Trees, with each tree trained on a different subset of data to predict the outcome. It uses Bagging technique and feature randomness when building each individual tree to create an uncorrelated forest of decision trees.

To generate results, each tree in the forest will generate a prediction from a given set of features and in the end the most commonly occurring prediction is chosen as the final prediction by majority vote.


Algorithm:

  1. Create "Bootstrap Samples" as samples of data with a subset of features.
  2. Create Decision Trees for each Bootstrap Sample.
    • Note that Trees are made independently from one another, and not all features are considered while making an individual tree; making each tree different.
  3. Majority Voting(for classification) or Averaging(for regression) of created models is used to select best model.

Notes:

  • Random Forests can be considered an Ensemble Model as they use Bagging where deep trees, fitted on bootstrap replicas, are combined to produce an output with lower variance.
  • Random Forests can be used for both Classification and Regression tasks:
    • For classification tasks, the output of the random forest is the class selected by most trees.
    • It can be utilized in regression tasks, then the mean or average prediction of the individual trees is returned.
  • Random Forests are highly stabile because their result is based on majority voting or averaging.
  • The training dataset is randomly split into multiple samples based on the number of trees in the forest. The number of trees is set via a hyperparameter and the optimal feature is used for splitting.
  • Random forest works well on large datasets with high dimensionality as the algorithm inherently performs feature selection.
  • It is not sensitive to outliers.
  • Random Forest can be prone to Overfitting although this is less than Decision Trees and can be mitigated to some degree with pruning.
  • It provides a low level of interpretability, however by extracting “feature importance” it can be improved.

Main parameters:

  • Max depth: Longest Path between root node and the leaf. As this parameter increases, amount of information is extracted from the tree will also increase. But it can't be too large as it'll cause Overfitting.
  • Min sample split: The minimum number of observations needed to split a given node.
  • Max leaf nodes: Conditions the splitting of the tree and hence, limits the growth of the trees.
  • Min samples leaf: minimum number of samples in the leaf node.
  • N-estimators: Number of trees. Increasing this parameter lowers Overfitting. It's initially set to square of the number of features and then optimized.
  • Max sample: Fraction of original dataset given to any individual tree in the given model.
  • Max features: Limits the maximum number of features provided to trees in random forest model.
  • Criterion: Method of splitting the nodes in each tree including Log Loss or Measure of Impurity such Entropy, Gini impurity.

Advantages:

  • Fast to train.
  • Capable of parallel processing.
  • Very stable and often can provide high quality results.
  • It's immune to Curse of Dimensionality as each tree does not consider all the features, the feature space is reduced.

Disadvantages:

  • Prediction task is not very fast, Specially with high number of trees in the forest.
  • Difficulty in tuning: With high number of trees in the forest, the tuning becomes complex and computationally expensive.
  • It's highly complex and not as easy to interpret as Decision Trees algorithm.
  • High memory consumption