用modelscope下载模型
用python下载
# !pip install modelscope
from modelscope.hub.snapshot_download import snapshot_download
model_dir = snapshot_download(
model_id='Qwen/Qwen3-8B',
local_dir='/ubuntu-22.04/LLaMA-Factory/models/qwen3-8b',
cache_dir='/ubuntu-22.04/LLaMA-Factory/models/qwen3-8b-cache')
用llamafactory加载模型
在终端启用
CUDA_VISIBLE_DEVICES=2,3 \
API_HOST=0.0.0.0 \
API_PORT=8001 \
API_KEY=sk-test\
llamafactory-cli api \
--model_name_or_path /ubuntu-22.04/LLaMA-Factory/models/qwen3-8b \
--template qwen \
--finetuning_type lora \
--trust_remote_code \
--max_new_tokens 32768
用vllm加载模型
在终端启用
CUDA_VISIBLE_DEVICES=5 vllm serve /ubuntu-22.04/LLaMA-Factory/models/qwen3-8b --port 8004 --host 0.0.0.0 --max-num-seqs 4 --max-model-len 4096 --served-model-name deepseek-ocr --gpu-memory-utilization 0.2