大模型评测工具DeepEval

doggie · 2025年11月25日06:38

官网

官方介绍

DeepEval is an open-source evaluation framework for LLMs. DeepEval makes it extremely easy to build and iterate on LLM (applications) and was built with the following principles in mind:
DeepEval 是一个用于 LLMs 的开源评估框架。DeepEval 使得构建和迭代 LLM（应用）变得极其简单，它遵循以下原则：

Easily “unit test” LLM outputs in a similar way to Pytest.
以类似 Pytest 的方式轻松“单元测试”LLM 输出。
Plug-and-use 30+ LLM-evaluated metrics, most with research backing.
即插即用 30 多项 LLM 评估指标，大多数都有研究支持。
Supports both end-to-end and component level evaluation.
支持端到端和组件级别的评估。
Evaluation for RAG, agents, chatbots, and virtually any use case.
适用于 RAG、代理、聊天机器人以及几乎所有用例的评估。
Synthetic dataset generation with state-of-the-art evolution techniques.
采用最先进的进化技术生成合成数据集。
Metrics are simple to customize and covers all use cases.
指标易于定制，涵盖所有用例。
Red team, safety scan LLM applications for security vulnerabilities.
红队，安全扫描 LLM 应用中的安全漏洞。

話題	回覆	觀看
【转载】大语言模型评估的常用方法、指标与框架 🤖人工智能大模型 , 测评指标 , 转载	13	2025年11月24日
中文大语言模型评测第三期 🤖人工智能评测	4	2025年12月20日
大模型如何评测？ 🤖人工智能人工智能 , 大模型 , 指标 , 评测	7	2025年11月24日
如何安装部署deepseek-ocr且通过vllm server提供访问 🤖人工智能 ocr , deepseek	3	2026年01月12日
中文语言模型测评（CLUE） 🛠工具与编程人工智能	13	2024年11月15日

大模型评测工具DeepEval

官网

官方介绍

Related topics