Mini-batch Gradient Descent

It's a Gradient (Calculus) based Optimization algorithm that combines Batch Gradient Descent and Stochastic Gradient Descent(SGD) algorithms. It uses a batch of a fixed number of training examples(called Mini-batch), to calculate model error and update model coefficients.

Notes:

  • Mini-batch size(Also generally called to batch size) is added to the list of hyperparameters.
  • Smaller batch size lead to faster convergence, but increases the cost of noise.
  • Larger batch size lead to slower convergence, but increase accuracy of estimations of error gradient.

Advantages

Disadvantages:

  • Similar to Batch Gradient Descent Error information must be accumulated across batches of training examples.

References: