Decision Trees

A decision tree is a tree-based structured classifier containing a series of conditional statements that determine what path a sample takes as it’s nodes and it’s output selected in leaf nodes.


types:

  • Categorical Variable Decision Tree: Decision Tree which has a categorical target variable then it called a Categorical variable decision tree.
  • Continuous Variable Decision Tree: Decision Tree has a continuous target variable then it is called Continuous Variable Decision Tree.

Main parameters:

  • Maximum tree depth
  • Minimum samples per leaf node
  • Impurity criterion

Concepts:

  • Root Node: It represents the entire population or sample and this further gets divided into two or more homogeneous sets.
  • Internal Nodes: they represents features.
  • Branches(Decision Nodes): When a sub-node splits into further sub-nodes, then it is called the decision node and represents decision rules.
  • Leaf / Terminal Node: Nodes do not split is called Leaf or Terminal node and represents the outcome.
  • Splitting: It is a process of dividing a node into two or more sub-nodes.
  • Stump: It is a decision tree with only one node and two leaves.
  • Pruning: When we remove sub-nodes of a decision node, this process is called pruning. You can say the opposite process of splitting.
  • Branch / Sub-Tree: A subsection of the entire tree is called branch or sub-tree.
  • Parent and Child Node: A node, which is divided into sub-nodes is called a parent node of sub-nodes whereas sub-nodes are the child of a parent node.
  • Impurity(Gini Index): Calculates the likelihood that a randomly picked instance would be erroneously cataloged.

Notes:


Advantages:

  • A decision tree is a very intuitive and interpretable.
  • Easy to implement.
  • Fast to train.
  • Fast inference
  • Doesn't require normalizing the dataset
  • It's suitable for both binary and multiclass classification.
  • It's not affected by outliers. Because during splitting, outliers will stay in a branch that doesn't care about the magnitude of the variable. I.e. it uses instead of

Disadvantages:

  • It is not suitable for complex data.

References: