Training a minimal large model from scratch

doggie · December 17, 2025, 5:50am

This open-source project aims to train an ultra-small language model called MiniMind—just 25.8M in size—from scratch, entirely from zero, using only a $3 cost and 2 hours!
The MiniMind series is extremely lightweight; its smallest version is just 1/7000 the size of GPT-3, designed to be quickly trained even on the most basic personal GPU.
The project also open-sources a minimal yet powerful large-model architecture, including full code for mixture-of-experts with extended shared experts (MoE), dataset cleaning, pretraining (Pretrain), supervised fine-tuning (SFT), LoRA fine-tuning, direct preference optimization (DPO), reinforcement learning training (RLAIF: PPO/GRPO, etc.), and model distillation.
MiniMind has also been extended to support vision-language multimodal capabilities via MiniMind-V.
All core algorithmic code is natively reimplemented from scratch in PyTorch—without relying on abstract interfaces provided by third-party libraries.
This is not only a complete end-to-end open-source reproduction of large language models but also a tutorial for getting started with LLMs.
We hope this project serves as an inspiration to everyone, inviting you to experience the joy of creation and drive progress across the broader AI community!

To avoid misunderstanding: The “2 hours” is tested on a single NVIDIA 3090 GPU, and the “$3” refers to the cloud GPU server rental cost. See details below for exact specifications.

Topic	Replies	Views
从零开始训练nanogpt 🤖人工智能	6	October 15, 2025
github推出自己的大模型raptor mini 💻编程 github	1261	November 18, 2025
whisper-v3模型下载（百度网盘 💻编程 whisper , openai	346	December 24, 2023
开源大模型食用指南 🤖人工智能大模型	15	July 23, 2025
如何将qwen3小模型和视觉模型拼接，进而为qwen3小模型提供视觉能力 🛠工具与编程	16	July 31, 2025

Training a minimal large model from scratch

Related topics