Large Language Models (LLM)

Large Language Models are Deep Learning models with billions of parameters and trained on petabytes of data that can generate text for tasks such as translation, classification, text manipulation, and more.


There are three types of LLMs:

  • Generic Language Models: These are able to predict a word (or a phrase) based on the language in the training data. It can be seen as a auto-completion feature.
  • Instruction Tuned Models. These kinds of models are trained to predict a response to the instructions given in the input. E.g. Summarization
  • Dialog Tuned Models. These are trained to have a dialogue with the user, using the subsequent responses. E.g. Chatbots or Question Answering (QA).

Notes:

  • Many NLP models use Generative Pretrained Transformer (GPT) based models
  • LLMs can be augmented with new information using Retrieval Augmented Generation (RAG) technique.
  • LLMs are considered an intersection of Natural Language Processing (NLP) and Generative AI.
  • Training LLMs models require huge datasets and massive computational resources. therefore few organizations have the capability to train them.
  • Zero-Shot Learning (ZSL) and Few-Shot Learning methodologies are often used in LLM training.
  • Fine-tuning LLM models are done by training it with domain-specific training data and prompts to improves its fitness for particular tasks.

Examples of LLMs:

  • PaLM (Pathways Language Model) - released by Google
  • LaMDA (Language Model for Dialogue Applications) - released by Google
  • BERT Family - released by Google
  • ChatGPT - released by OpenAI
  • Gemini(previously Bard) - released by Google
  • Large Language Model Meta AI (Llama) - released by Meta
  • More: Claude, Falcon, T-5 FLAN, ViT, EfficientNet

Sources: