Transformers

Transformers process input sequences and generate output sequences by utilizing self-attention mechanisms. They were first introduced as a tool for sequence transduction and converting one sequence of symbols to another, such as translation.

Notes:


Transformer Architecture

Transformer architecture at a high level consist of:

  • Encoder: it encodes the input sequence and passes it to the decoder.
  • Decoder: learns how to decode the representations for a relevant task.
Info

Both encoder and decoder of a stack have an equal number of identical layers, and each layer composed of a number of sublayers.

Transformer architecture.webp

Components of Transformers

Input
Output
Transformer Blocks
Block
FeedForward
Attention
...
Block
FeedForward
Attention
Start
Tokenizer
Embedding
Positional encoding
SoftMax
End

Transformer Block

Transformer Block is a collection of multiple layers of transformer blocks containing attention and a feedforward layer. Transformer Block creates a list of suggestions as next output result.

Components:

  • The Self-Attention component: Attention Mechanism is used to enhance this component.
  • The FeedForward component: They are small Artificial Neural Networks (ANN) with FeedForward nodes, which is responsible for predicting the next word in this sentence.

Attention Mechanism

Attention Mechanism deals with context and problem of Semantic Ambiguity.

Note

A self-attention layer assigns a weight to each part of an input. The weight signifies the importance of that input in context to the rest of the input.

SoftMax

SoftMax is responsible for selecting the best output from the list generated in Transformer Block. It converts each output's score into probability in order to choose most probable outcome as transformers next output.


Applications:


Notes:

  • Transformers are designed to process sequential input data non-sequentially.
  • self-attention and positional encodings make transformers particularly adept for text-based Generative AI applications.

Learning Material:

References: