Quantization

Quantization is the compression of weights in Artificial Neural Networks (ANN) from floating-point numbers into lower-bit representation which reduces the complexity and size of the model, with small loss in model accuracy.

For example if 32-bit floating point numbers in a Deep Learning model are regularized into 8-bit integer numbers, the model's size and computing resources required for training and running models will be greatly reduced while accuracy won't noticeably change.


Types:

  • Post-Training Quantization: Quantization applied after training
  • Quantization Aware Training: Quantization applied during training

Advantages:

  • Faster Model
  • Smaller Size of Model
  • Faster inference

Disadvantages: