Training a minimal large model from scratch

  • This open-source project aims to train an ultra-small language model called MiniMind—just 25.8M in size—from scratch, entirely from zero, using only a $3 cost and 2 hours!
  • The MiniMind series is extremely lightweight; its smallest version is just 1/7000 the size of GPT-3, designed to be quickly trained even on the most basic personal GPU.
  • The project also open-sources a minimal yet powerful large-model architecture, including full code for mixture-of-experts with extended shared experts (MoE), dataset cleaning, pretraining (Pretrain), supervised fine-tuning (SFT), LoRA fine-tuning, direct preference optimization (DPO), reinforcement learning training (RLAIF: PPO/GRPO, etc.), and model distillation.
  • MiniMind has also been extended to support vision-language multimodal capabilities via MiniMind-V.
  • All core algorithmic code is natively reimplemented from scratch in PyTorch—without relying on abstract interfaces provided by third-party libraries.
  • This is not only a complete end-to-end open-source reproduction of large language models but also a tutorial for getting started with LLMs.
  • We hope this project serves as an inspiration to everyone, inviting you to experience the joy of creation and drive progress across the broader AI community!

To avoid misunderstanding: The “2 hours” is tested on a single NVIDIA 3090 GPU, and the “$3” refers to the cloud GPU server rental cost. See details below for exact specifications.