Non-linear function, derived from regular tangent function with output between -1 and 1.



  • : Represents Euler's number.
  • : is the Input value.


The curves of the tanh function and sigmoid function are relatively similar. When the input is large or small, the output is almost smooth and the gradient is small, which is not conducive to weight update. The difference is the output interval.


  • Using Tanh is as simple as training a linear model, as long as computations are small.
  • As it's introduces non-linearity to model, it allows network to model more complex relationships between input and output.
  • TanH is an odd function, meaning , which results in symmetry around the origin.
  • TanH has the Vanishing Gradient problem, but the gradient is stronger for TanH than sigmoid (derivatives are steeper).
  • TanH is zero-centered, and gradients do not have to move in a specific direction. As it centers around zero, it's much faster to converge during training.