`tags: [AI/ML/ActivationFunction]`

Non-linear function with output between 0 and 1; providing a smooth and continuous

Equation:

Where:

: represents the based of natural algorithm. : is the input of function

Notes:

- It has a smooth gradient, and It’s good for a classifier type problem.
- The output of the activation function is always going to be in the range (0,1) compared to (-∞, ∞) of linear activation function.
- Given input Zero, the output approximates
while positive input approximates toward 1 and negative input approximates toward 0. - Its output isn’t zero centered, and it makes the gradient updates go too far in different directions. The output value is between zero and one, so it makes optimization harder.
- The Sigmoid function is the most frequently used activation function at the beginning of deep learning. It is a smoothing function that is easy to derive.
- The derivative of the sigmoid function for large positive or negative numbers is almost zero, leading to Vanishing Gradient problem, often making ReLU (Rectified Linear Unit) a better alternative. I.e. Sigmoids saturate and kill gradients; Where the network either refuses to learn more or is extremely slow.
- Sigmoid can be thought of as the firing rate of a neuron.

Interactive Graph

Table Of Contents