It's a Gradient (Calculus) based Optimization algorithm which helps to minimize the Cost Function by incrementally changing the Weights of the network. In each iteration, it only calculates just one example from a random data point across the whole dataset, therefor it's considered fast and require less resources, however it doesn't guarantee convergence to the global minimum.

  • Simple and easy to implement
  • Memory efficient
  • Faster in some problems due to frequency of model updates.
  • Avoidance of local minima due to noisy update process.
  • Better insight into the performance and the rate of improvement due to frequent updates (immediate feedback)


  • Frequently updating model can be computationally expensive
  • Frequently updating model can lead to noisy gradient signal
  • Slow Convergence
  • It's not always able to converge to the global minimum
  • Sensitivity to Learning Rate
  • Less Accurate