ReLU (Rectified Linear Unit)

This function introduces non-linearity by outputting input value if it's positive, and zero if it's negative, takes advantage of linear and non-linear functions. it’s range is (0, ∞)


Where is the input.


The ReLU function is actually a function that takes the maximum value. Note that this is not fully interval-derivable, but we can take a sub-gradient, as shown in the figure above. Although ReLU is simple, it is an important achievement in recent years.

  • It is computationally faster compared to other activation functions.
  • Since only a certain number of neurons are activated, the ReLU function is far more computationally efficient when compared to the sigmoid and TanH functions.
  • As it only activates some neurons it can lead to dead neurons when their input is zero. Other variations of ReLU with slopes of non-zero output had been introduced to alleviate this problem.
  • It’s solved the problem of Vanishing Gradient because the maximum value of the gradient of ReLU function is one.
  • It solved the problem of saturating neuron, since the slope is never zero for ReLU function.
  • ReLU accelerates the convergence of gradient descent towards the global minimum of the loss function due to its linear, non-saturating property.
  • One of its limitations is that it should only be used within hidden layers of an artificial neural network model.
  • Some gradients can be fragile during training. because For activations in the region () of ReLu, the gradient will be 0 because of which the Weights will not get adjusted during descent. That means, those neurons, which go into that state will stop responding to variations in input (simply because the gradient is 0, nothing changes.) This is called the dying ReLu problem.

Other Versions

Leaky ReLU

Leaky ReLU: It solves the dying ReLU problem, as it has a small positive slope in the negative area.


  • is the input.

  • is a small positive number, usually 0.01 is used.

    • It does enable back propagation, even for negative input values.
    • Making minor modification of negative input values, the gradient of the left side of the graph comes out to be a real (non-zero) value. As a result, there would be no more dead neurons in that area.
    • The predictions may not be steady for negative input values.

Exponential Linear Units

ELU (Exponential Linear Units) function: similar to ReLU, it resolves some of it’s issues. however is computationally expensive. ELU, just like leaky ReLU also considers negative values by introducing a new alpha parameter and multiplying it will another equation.

   x &\text{for } x \ge 0 \\
   \alpha(e^x-1) &\text{for } x < 0


- ELU is a strong alternative to ReLU. Different from the ReLU, ELU can produce negative outputs.
- Exponential operations are there in ELU, So it increases the computational time.
- No learning about the ‘a’ value takes place, and exploding gradient problem.
  • PRelu
  • SeLU