Quantization is the compression of weights in Artificial Neural Networks (ANN) from floating-point numbers into lower-bit representation which reduces the complexity and size of the model, with small loss in model accuracy.

For example if 32-bit floating point numbers in a Deep Learning model are regularized into 8-bit integer numbers, the model's size and computing resources required for training and running models will be greatly reduced while accuracy won't noticeably change.


  • Post-Training Quantization: Quantization applied after training
  • Quantization Aware Training: Quantization applied during training


  • Faster Model
  • Smaller Size of Model
  • Faster inference