Maxout is an Activation Function introduced to address some limitations of traditional activation functions like ReLU (Rectified Linear Unit) or Sigmoid. Unlike those functions, which are fixed, Maxout is a flexible activation function that learns the best activation function for the given data.

Maxout activation takes the maximum value over a set of linear functions, allowing the model to learn the most appropriate activation function during training. Specifically, Maxout takes the maximum value over a group of linear functions of the form , where w is a weight vector, is the input, and b is a bias term.

Properties and Usage:

  • Maxout is highly adaptable and can learn different activation functions for different parts of the input space, giving it significant flexibility in representing complex relationships in the data.
  • It is particularly useful in architectures where the model needs to automatically learn which features are important for each layer, effectively reducing the need for manual feature engineering.
  • Maxout has shown to be effective in Deep Learning architectures and is commonly used in Convolutional Neural Networks (CNN) and other models where adaptive, non-linear activation functions are desired.


  • Maxout activation functions can significantly increase the model's capacity and are known for their success in various machine learning tasks, particularly in Computer Vision and natural language processing.
  • Maxout can adapt to different types of data and learn the most appropriate activation functions.
  • Maxout introduces non-linear behavior and can represent complex functions.