It's a Gradient (Calculus) based Optimization algorithm that combines Batch Gradient Descent and Stochastic Gradient Descent(SGD) algorithms. It uses a batch of a fixed number of training examples(called Mini-batch), to calculate model error and update model coefficients.

Notes:

`Mini-batch size`

(Also generally called to`batch size`

) is added to the list of hyperparameters.- Smaller batch size lead to faster convergence, but increases the cost of noise.
- Larger batch size lead to slower convergence, but increase accuracy of estimations of error gradient.

- Higher model update frequency than Batch Gradient Descent which leads to a more robust convergence, avoiding local minima.
- Faster compared to Stochastic Gradient Descent(SGD) due to decreased update frequency.
- Unlike Batch Gradient Descent it doesn't require all dataset to be in memory.

- Similar to Batch Gradient Descent Error information must be accumulated across batches of training examples.

References:

Interactive Graph

Table Of Contents