Vanishing Gradient

Vanishing gradient is a common problem encountered during Artificial Neural Networks (ANN) training. Some Activation Functions such as Sigmoid or Tanh have a small output range (0 to 1). So a huge change in the input of the sigmoid activation function will create a small modification in the output. Therefore, the derivative also becomes small. These activation functions are only used for shallow networks with only a few layers. When these activation functions are applied to a multi-layer network, the Gradient may become too small for expected training.