Quantization is the compression of weights in Artificial Neural Networks (ANN) from floating-point numbers into lower-bit representation which reduces the complexity and size of the model, with small loss in model accuracy.

For example if 32-bit floating point numbers in a Deep Learning model are regularized into 8-bit integer numbers, the model's size and computing resources required for training and running models will be greatly reduced while accuracy won't noticeably change.

Types:

- Post-Training Quantization: Quantization applied after training
- Quantization Aware Training: Quantization applied during training

Advantages:

- Faster Model
- Smaller Size of Model
- Faster inference

Disadvantages:

- Harder Convergence
- It makes Backpropagation infeasible

Interactive Graph

Table Of Contents