Activation Functions

Activation functions are mathematical formulas that help determine the output of a neural network by introducing non-linearity to it and generating output from a collection of input values fed to a layer.

These type of functions are attached to each neuron in the network and determines whether it should be activated or not, based on whether each neuron’s input is relevant for the model’s prediction.


In a neural network, inputs are fed into the neurons in the input layer. Each neuron has a weight, and multiplying the input number with the weight gives the output of the neuron, which is transferred to the next layer.

The activation function is a mathematical “gate” in between the input feeding the current neuron and its output going to the next layer. It can be as simple as a step function that turns the neuron output on and off, depending on a rule or threshold.

Activation Functions:

Choosing the correct activation function will improve ANN's ability to learn, generalize, and it's speed and convergence. They are often chosen after examination and experimentation, however generally they can be chosen based on model type:

Output Layer:

  • Linear: does not change the weighted sum of the input in any way and instead returns the value directly
  • Logistic (Sigmoid)
  • Softmax: most popular activation function for output layer


  • If we encounter a case of dead neurons in our networks the leaky ReLU (Rectified Linear Unit) function is the best choice
  • Due to vanishing gradients problem Sigmoid and Tanh functions are no longer generally used. ReLU (Rectified Linear Unit) and it’s other types are now the default activation function in hidden layer if ANNs.
  • Sigmoid can be used as an alternative to ReLU (Rectified Linear Unit) in some classification problems.
  • ReLU (Rectified Linear Unit) function should only be used in the hidden layers.
  • Softmax is generally used in the output layer to normalize the output received and find how close the result was to the original value and by how much. It outputs a Vector of values that sum to 1.0 which can be interpreted as probabilities of class membership
  • ReLU (Rectified Linear Unit) is the best option in hidden layers and in case of problems or for optimization it can be changed with it’s sub-types.
  • Weights are initially randomly assigned. Setting initial weights to zero will cause model to never converge.

Considerations regarding activation functions:

  • Range and curves of activation functions is often used to categorize them.
  • Activation Function’s Range of output: The activation function helps to normalize the output of each neuron to a range between 1 and 0 or between -1 and 1.
  • Activation Function’s Equations
  • Model type and application for Activation Function selection

Problems in activation functions: