Read—Quite Useful
- 【转载】大语言模型评估的常用方法、指标与框架
- 【转载】如何测评大模型对长文本处理的性能
- 【转载】大模型有哪些评估指标?
- [Flasher] How Are Large Models Tested? Taking Gemini3 as an Example _ Bilibili
- Do You Know What Metrics Are Used to Evaluate the Performance of a Large Model? PPL, MMLU, MATH, GPQA, BBH, IF-EVAL, MMLU-PRO _ Bilibili
See How Others Do It
- Alibaba: 快速评测大语言模型-人工智能平台 PAI(PAI)-阿里云帮助中心