Benchmarks

  • DecaNLP: The Natural Language Decathlon (decaNLP) is a benchmark for studying general NLP models that can perform a variety of complex, natural language tasks. It evaluates performance on ten disparate natural language tasks.
  • GLUE: The General Language Understanding Evaluation benchmark (GLUE) is a tool for evaluating and analyzing the performance of models across a diverse range of existing natural language understanding tasks. Models are evaluated based on their average accuracy across all tasks.
  • Jiant: Jiant is a software toolkit for research on general-purpose text understanding models.
  • Quora Question Pairs (QQP)
  • SentEval
  • CLUTRR