DeepEval: Large Model Evaluation Tool

doggie · November 25, 2025, 6:38am

Official Website

Official Introduction

DeepEval is an open-source evaluation framework for LLMs. DeepEval makes it extremely easy to build and iterate on LLM (applications) and was built with the following principles in mind:

Easily “unit test” LLM outputs in a similar way to Pytest.
Plug-and-use 30+ LLM-evaluated metrics, most with research backing.
Supports both end-to-end and component level evaluation.
Evaluation for RAG, agents, chatbots, and virtually any use case.
Synthetic dataset generation with state-of-the-art evolution techniques.
Metrics are simple to customize and cover all use cases.
Red team, safety scan LLM applications for security vulnerabilities.

Topic	Replies	Views
【转载】大语言模型评估的常用方法、指标与框架 🛠工具与编程大模型 , 测评指标 , 转载	113	November 24, 2025
中文大语言模型评测第三期 🛠工具与编程评测	27	December 20, 2025
大模型如何评测？ 🛠工具与编程人工智能 , 大模型 , 指标 , 评测	31	November 24, 2025
大模型测评指标——困惑度 🛠工具与编程大模型 , 困惑度	25	April 29, 2026
如何安装部署deepseek-ocr且通过vllm server提供访问 🛠工具与编程 ocr , deepseek	21	January 12, 2026

DeepEval: Large Model Evaluation Tool

Official Website

Official Introduction

Related topics