Mini-batch Gradient Descent

It's a Gradient (Calculus) based Optimization algorithm that combines Batch Gradient Descent and Stochastic Gradient Descent(SGD) algorithms. It uses a batch of a fixed number of training examples(called Mini-batch), to calculate model error and update model coefficients.


  • Mini-batch size(Also generally called to batch size) is added to the list of hyperparameters.
  • Smaller batch size lead to faster convergence, but increases the cost of noise.
  • Larger batch size lead to slower convergence, but increase accuracy of estimations of error gradient.



  • Similar to Batch Gradient Descent Error information must be accumulated across batches of training examples.