Knowledge distillation

Knowledge Distillation technique trains a large, complex model(Teacher) on a large dataset. Then this model is used in creating a smaller model by transferring knowledge.

The idea behind knowledge distillation is to create a simple 'student' model that learns from a more complex 'teacher' model. The goal is to duplicate the performance of a complex model into a simpler, more efficient model.


While Knowledge Distillation is complex and time consuming, it can greatly reduce the size of mode without sacrificing accuracy.

Knowledge Distillation strategies:

  • Offline distillation
  • Online distillation
  • Self-distillation