L1 Regularization

It calculates the absolute difference between the current output and the expected output divided by the number of output. It’s aim is to minimize this absolute differences.

  • Where determines the amount of regularization.
  • Mathematically it adds the sum of absolute values of the coefficients to the cost function.
  • L1 regularization adds a penalty term to the cost function which is equal to the sum of modules of models coefficients multiplied by a hyperparameter.

Notes:

  • Robust to outliers: MAE is not very sensitive towards outliers as it is based on absolute value.
  • A linear regression that uses the L1 regularization technique is called Lasso Regression.
  • L1 Regularization performs Feature Selection by reducing the coefficients of some predictors.
  • Complexity: MAE can’t be optimized by gradient descent, instead it’s optimized by Sub-Gradients, which adds complexity.
  • We use MAE instead of simple difference calculation because it’s prone to Mean Bias Error.E.g. the difference between actual value of (10,5) and estimated value of (8,7) can be calculated as which is misleading.
  • L1 Regularization can be used to induce sparsity in the learned Weights.