DeepEval: Large Model Evaluation Tool

Official Website

Official Introduction

DeepEval is an open-source evaluation framework for LLMs. DeepEval makes it extremely easy to build and iterate on LLM (applications) and was built with the following principles in mind:

  • Easily “unit test” LLM outputs in a similar way to Pytest.
  • Plug-and-use 30+ LLM-evaluated metrics, most with research backing.
  • Supports both end-to-end and component level evaluation.
  • Evaluation for RAG, agents, chatbots, and virtually any use case.
  • Synthetic dataset generation with state-of-the-art evolution techniques.
  • Metrics are simple to customize and cover all use cases.
  • Red team, safety scan LLM applications for security vulnerabilities.