Bag of Words (BoW)

A bag of words model is an array with one column for each unique word, and one column containing integer values representing term frequency. I.e. Bag of Words creates a set of vectors containing the occurrences of words in the document.


Advantages:

  • Simple to understand and implement.

Disadvantages:

  • Challenges in sparse representations
  • The vocabulary must be carefully designed to properly represent sparsity.
  • It removes context by ignoring word order.

Notes:

  • Bag of words, depending on corpus size, can create a Matrix with large number of columns.