Non-linear function with output between 0 and 1; providing a smooth and continuous -shaped curve.



  • : represents the based of natural algorithm.
  • : is the input of function


  • It has a smooth gradient, and It’s good for a classifier type problem.
  • The output of the activation function is always going to be in the range (0,1) compared to (-∞, ∞) of linear activation function.
  • Given input Zero, the output approximates while positive input approximates toward 1 and negative input approximates toward 0.
  • Its output isn’t zero centered, and it makes the gradient updates go too far in different directions. The output value is between zero and one, so it makes optimization harder.
  • The Sigmoid function is the most frequently used activation function at the beginning of deep learning. It is a smoothing function that is easy to derive.
  • The derivative of the sigmoid function for large positive or negative numbers is almost zero, leading to Vanishing Gradient problem, often making ReLU (Rectified Linear Unit) a better alternative. I.e. Sigmoids saturate and kill gradients; Where the network either refuses to learn more or is extremely slow.
  • Sigmoid can be thought of as the firing rate of a neuron.