It’s a self-gated activation function created by Google researchers which allows for the propagation of a few numbers of negative weights.


  • Swish is a smooth activation function that means that it does not suddenly change direction like ReLU does near x equal to zero. Rather, it smoothly bends from 0 towards values < 0 and then upwards again.
  • Non-positive values were zeroed out in ReLU activation function. Negative numbers, on the other hand, may be valuable for detecting patterns in the data. Because of the sparsity, large negative numbers are wiped out, resulting in a win-win situation.
  • The swish activation function being non-monotonous enhances the term of input data and weight to be learnt.
  • Slightly more computationally expensive and More problems with the algorithm will probably arise given time.