Ontology Learning

Ontology Learning is the construction of Domain Model from text by leveraging formal structures.

Ontology Learning Process:

  • Terms: Extracting terms to build concepts
  • Synonyms: Forming synonyms to from extracted terms to group terms of a function together.
  • Concepts: Creating concepts from Synonym Sets(Synsets)
  • Concept Hierarchy: Taxonomic relationship between concepts are formalized.
  • Relations: taxonomic and none-taxonomic relationships are extracted from formal text.
  • Relation Hierarchy: Forming taxonomic relationships are formed.
  • Axioms Schemata: formal formula for representing axioms are decided.
  • General Axioms: Axiom are extracted. this usually involves Inductive Logic Programming(ILP)

Ontology learning techniques:

  • Linguistics: Often used in term, concept, and relation extraction.
    • Term/concept extraction
      • Syntactic Analysis: A linguistic technique for analyzing formal grammatical rules of natural language in a given text.
      • Sub-categorization frame: it’s a linguistic technique for term extraction. Sub-categorization frame of a word is the number of words of a certain form that it selects when appearing in a sentence. When used in conjunction with clustering techniques, this restriction of selection is able to discover concepts.
      • Seed Words: Seed words are domain-specific words that provide a base for other algorithms to extract similar domain specific terms and concepts.
    • Relation Extraction:
      • Dependency Analysis: helps in finding syntactic relations between terms.
      • Lexico-Syntatic pattern is a rule-based relation extraction method.
  • Statistical: Statistical techniques are solely based on statistics of the underlying corpora and do not consider underlying semantics.
    • Term/concept extraction
      • C/NC value: C/NC value is used for multi-word terminology extraction.
      • contrastive analysis: Contrastive analysis is a technique that filters out terms obtained through term extraction procedure that are not relevant to the domain of the corpus.
      • Clustering: Algorithms such as k-means which is used for clustering of terms and concepts.
      • Co-occurrence analysis: Co-occurrence analysis is a concept extraction technique that locates the lexical units that occur together in pursuit of finding the implicit associations between various terms and concepts as well as extracting related terms.
      • LSA: It is a concept extraction algorithm based on the idea that terms occurring together will be close in meaning. LSA applies the mathematical technique of singular value decomposition on term document matrix to reduce the dimension of data while maintaining the similarity structure.
    • Relationship extraction:
      • Term subsumption: it finds hierarchical relations between terms by using the Conditional Probability of those terms in underlying documents.
      • FCA: it builds concept hierarchies by relying on the basic idea that objects are connected with their characteristics (attributes).
      • Hierarchical clustering
      • ARM: A non-taxonomic relation extraction method used for discovering rules to predict co-occurrences.
  • Logical
    • Inductive Logic Programming(ILP)

Algorithms and methods used in the above ontology techniques.