NLP Concepts

Corpus: A corpus is a collection of text or audio which has been organized into a dataset.

Knowledge Base: Is a representation of knowledge often in large repositories of structured or unstructured data.

knowledge graph: Is a representation of knowledge(a knowledge base) in a graph structure

Ontology(Ontologies) Is a set of concepts, classes, or instance of those classes which are interconnected and defined via relationships, attributes(defined via properties), and axioms(such as rules, assertations, constraints, and events).

Ontology Learning(Ontology Learning): The term ontology learning refers to the automatic or semi-automatic support for the construction of an ontology.

Word Representations: computers can’t easily process words and their transformations. therefore words may be transformed and represented in other formats.

Problems with most word representation approaches:

  • They fail to capture the syntactic and semantic meaning of words
  • They suffer from the so-called Curse of Dimensionality

Types of word representation:

  • Dictionary Lookup
  • Weighted Word representation: methods using TF-IDF to compute word values.
    • Distributional Representation: Words are stored based on their context, which is determined by how often they appear together and are stored in a word-context co-occurrence matrix.
  • Word Embedding Techniques

Middle Language:

  • Precisely defined and unambiguous
  • Natural ↔ formal (for input and output)
  • capable of automated reasoning