Official Website
Official Introduction
DeepEval is an open-source evaluation framework for LLMs. DeepEval makes it extremely easy to build and iterate on LLM (applications) and was built with the following principles in mind:
- Easily “unit test” LLM outputs in a similar way to Pytest.
- Plug-and-use 30+ LLM-evaluated metrics, most with research backing.
- Supports both end-to-end and component level evaluation.
- Evaluation for RAG, agents, chatbots, and virtually any use case.
- Synthetic dataset generation with state-of-the-art evolution techniques.
- Metrics are simple to customize and cover all use cases.
- Red team, safety scan LLM applications for security vulnerabilities.