Measure of Impurity

The Measure of Impurity (Impurity Criterion) is used to evaluate the homogeneity of a set of class labels within a node of Decision Trees or other Classification models. It helps in determining the most suitable feature to split the data at each node and is crucial for building effective decision trees.

Important

Key components:

  1. Gini Impurity: A measure of impurity that quantifies the probability of incorrectly classifying a randomly chosen element if it were randomly labeled according to the distribution of class labels in the node.
  2. Information Gain (or Entropy): Information Gain measures the amount of information gained about a class when a certain feature is observed. Entropy calculates the level of disorder or uncertainty in a set of class labels within a node.
  3. Misclassification Error: A simple measure of impurity that computes the probability of misclassifying an observation within a node based on the majority class.
Info

Impurity (Gini Index) calculates the likelihood that a randomly picked instance would be erroneously cataloged.


Notes:

  • Gini impurity and entropy are commonly used measures of impurity in decision tree algorithms such as Classification and Regression Tree (CART).
  • Lower values of impurity measures indicate higher homogeneity and better separability of class labels within a node, making them desirable for splitting.
  • For Regression Models the measure of impurity is .