官网
官方介绍
DeepEval is an open-source evaluation framework for LLMs. DeepEval makes it extremely easy to build and iterate on LLM (applications) and was built with the following principles in mind:
DeepEval 是一个用于 LLMs 的开源评估框架。DeepEval 使得构建和迭代 LLM(应用)变得极其简单,它遵循以下原则:
- Easily “unit test” LLM outputs in a similar way to Pytest.
以类似 Pytest 的方式轻松“单元测试”LLM 输出。 - Plug-and-use 30+ LLM-evaluated metrics, most with research backing.
即插即用 30 多项 LLM 评估指标,大多数都有研究支持。 - Supports both end-to-end and component level evaluation.
支持端到端和组件级别的评估。 - Evaluation for RAG, agents, chatbots, and virtually any use case.
适用于 RAG、代理、聊天机器人以及几乎所有用例的评估。 - Synthetic dataset generation with state-of-the-art evolution techniques.
采用最先进的进化技术生成合成数据集。 - Metrics are simple to customize and covers all use cases.
指标易于定制,涵盖所有用例。 - Red team, safety scan LLM applications for security vulnerabilities.
红队,安全扫描 LLM 应用中的安全漏洞。