Bag of Words (BoW)

A bag of words model is an array with one column for each unique word, and one column containing integer values representing term frequency. I.e. Bag of Words creates a set of vectors containing the occurrences of words in the document.


  • Simple to understand and implement.


  • Challenges in sparse representations
  • The vocabulary must be carefully designed to properly represent sparsity.
  • It removes context by ignoring word order.


  • Bag of words, depending on corpus size, can create a Matrix with large number of columns.