Adam

First-order gradient-based optimization of stochastic objective functions, based on adaptive estimates of lower-order moments.
\ From paper: Adam: A Method for Stochastic Optimization

Adam Optimizer, short for "Adaptive Moment Estimation" is an iterative Optimization algorithm used to minimize the Loss Functions in training Artificial Neural Networks. Instead of the gradient, it utilizes momentum using the moving average of the gradient.


Notes:

  • Adam algorithm adjusts the Learning Rate for each parameter individually, I.e. it uses Adaptive Learning Rates. This improves efficiency.
  • Epsilon Value () parameter is a small constant added to improve numerical stability.
  • Both Learning Rate(Usually set at 0.001) and Epsilon Value () are important in hyperparameter tuning process.
  • Adam performs bias correction, improving the convergence and stability in training process.
  • Adam is memory-efficient, as it only requires two moving averages per parameter.
  • Adam can reduce Overfitting by utilizing Regularization techniques such as Dropout or weight decay.

References:
Adam: A Method for Stochastic Optimization