Convolutional Neural Networks (CNN)

CNN is a multi-layered Artificial Neural Networks (ANN) with a unique architecture designed to extract increasingly complex features of the data at each layer to determine the output.

This type of algorithms are use for classification tasks and are well suited for Computer Vision and other tasks which require extraction of complex patterns. CNN uses a process called Convolution(mathematics operation) which involves applying a set of filters over data to detect patterns and features. An image comprising a Matrix of pixel values is processed through the convolutional layer, pooling layer, and fully connected (FC) layer. The pooling layer reduces the spatial size of the convolved feature. The final output layer generates a confidence score to determine how likely it is that an image belongs to a defined class.


  1. Input Layer
  2. Convolution Layer: It contains the learned kernels (Weights), which extract features that distinguish different images from one another(convolves the image using filters that detect edges, or other components of the image).
  3. Pooling Layers: Multiple pooling layers are responsible for gradually decreasing the spatial extent of the network by progressively reducing the spatial volume of feature maps and reducing the amount of parameters, which reduces the parameters and overall computation of the network.
  4. Flatten Layer: This layer converts a three-dimensional layer in the network into a one-dimensional vector to fit the input of a fully-connected layer for classification.
  5. Output Layer



  • CNNs employ weight sharing, which reduces the number of parameters in the model, making training more efficient.
  • Data Augmentation is often used in CNN training process as a part of Data Preparation to mitigate training challenges.
  • Dilated Convolutions in a CNN architecture can allow the model to have a larger receptive field without significantly increasing the number of parameters.
  • In CNN architecture, max-pooling layers are used to reduce the spatial dimensions of the feature maps.
  • A 1x1 convolution is used to reduce the number of channels in a feature map without changing its spatial dimensions.
  • The Weights in the convolutional layers of a CNN are shared across different regions of the input.
  • In a CNN, convolutional layers are responsible for learning spatial hierarchies of features.