How to Perform Model Pretraining

doggie · April 29, 2026, 3:47am

Operations

import os
os.environ["CUDA_VISIBLE_DEVICES"] = "1"
from peft import TaskType, LoraConfig, get_peft_model
from transformers import AutoModelForCausalLM, AutoTokenizer, BitsAndBytesConfig, LlamaConfig, LlamaForCausalLM, LlamaModel
import torch

# Load the model
model_path = '/data04/llama3/Meta-Llama-3.1-8B-Instruct'
tokenizer = AutoTokenizer.from_pretrained(model_path)

model = AutoModelForCausalLM.from_pretrained(
    model_path,
    low_cpu_mem_usage=True,
    quantization_config=bnb_config
)

model = get_peft_model(model, peft_config)
model.print_trainable_parameters()
# Move the model to GPU
model.to("cuda")
optimizer = torch.optim.AdamW(model.parameters())
# Tokenize the input text and move it to GPU
text = "The weather is nice today."
input = tokenizer(text, return_tensors="pt")
input = {k: v.to("cuda") for k, v in input.items()}

# Set labels equal to inputs
input["labels"] = input["input_ids"].clone()

# Forward pass
output = model(**input)

# Retrieve the model's loss
loss = output.loss
# Backward pass
loss.backward()
# Update parameters
optimizer.step()
optimizer.zero_grad()

# Save the model
model.save_pretrained("output_dir")

How is the loss function calculated?

The output of a large language model is essentially word classification; loss is computed using cross-entropy.

References

Everything You Need to Know About Large Model Pretraining — Bilibili Video

Topic		Replies	Views
三分钟讲解大模型训练全过程 🤖人工智能大模型 , 模型训练	0	10	April 27, 2026
预训练语言模型发展史 🤖人工智能大模型 , 自然语言处理	1	22	October 8, 2025
如何下载大模型并用llamafactory启动 🥼实践与临床	0	16	February 26, 2026
如何将qwen3小模型和视觉模型拼接，进而为qwen3小模型提供视觉能力 🛠工具与编程	0	31	July 31, 2025
如何进行有监督微调（Supervised Fine-Tuning, SFT） 🤖人工智能 supervised-fine-tuning	0	7	April 29, 2026

How to Perform Model Pretraining

Operations

How is the loss function calculated?

References

Related topics