deepseek-ocr
Reference Documentation
http://github.com/deepseek-ai/DeepSeek-OCR?tab=readme-ov-file
First clone the project, then install the conda environment
cd /data/tangjianing/.data_mapping
sudo git clone https://github.com/deepseek-ai/DeepSeek-OCR.git
sudo git clone https://www.modelscope.cn/deepseek-ai/DeepSeek-OCR.git /data/tangjianing/.data_mapping/deepseek-ocr
sudo chown -R
tangjianing:tangjianing/data/tangjianing/.data_mapping/deepseek-ocr
sudo yum install git-lfs
cd /data/tangjianing/.data_mapping/deepseek-ocr
git lfs pull
Create the virtual environment
sudo docker exec -it ubuntu-container-wyq /bin/bash
conda create -n deepseek-ocr python=3.12.9 -y
conda activate deepseek-ocr
conda create -n deepseek-ocr python=3.11 -y
cd ubuntu-22.04/deepseek-ocr1/
pip install torch==2.6.0 torchvision==0.21.0 torchaudio==2.6.0 --index-url https://download.pytorch.org/whl/cu118
pip install vllm-0.8.5+cu118-cp38-abi3-manylinux1_x86_64.whl
pip install -r requirements.txt
pip install flash-attn==2.7.3 --no-build-isolation
cuda=12.2 version conflict issue is being resolved
Issue with the missing requirements.txt file
Map the host directory outside of Docker to the Docker container to achieve file sharing between the host and the container (real-time synchronization inside the container after modification on the host).
(deepseek-ocr) root@dfba29c2bbc6:/data/tangjianing/.data_mapping/deepseek-ocr# ls -l /data/tangjianing/.data_mapping/deepseek-ocr/
total 209248
-rw-r--r--. 1 root root 1064 Nov 21 01:51 LICENSE
-rw-r--r--. 1 root root 6308 Nov 21 01:51 README.md
drwxr-sr-x. 2 root root 114 Nov 21 01:51 assets
-rw-r--r--. 1 root root 2666 Nov 21 01:51 config.json
-rw-r--r--. 1 root root 76 Nov 21 01:51 configuration.json
-rw-r--r--. 1 root root 10646 Nov 21 01:51 configuration_deepseek_v2.py
-rw-r--r--. 1 root root 9253 Nov 21 01:51 conversation.py
-rw-r--r--. 1 root root 38008 Nov 21 01:51 deepencoder.py
-rw-r--r--. 1 root root 135 Nov 21 01:51 model-00001-of-000001.safetensors
-rw-r--r--. 1 root root 246759 Nov 21 01:51 model.safetensors.index.json
-rw-r--r--. 1 root root 40133 Nov 21 01:51 modeling_deepseekocr.py
-rw-r--r--. 1 root root 82224 Nov 21 01:51 modeling_deepseekv2.py
-rw-r--r--. 1 root root 460 Nov 21 01:51 processor_config.json
-rw-r--r--. 1 root root 801 Nov 21 01:51 special_tokens_map.json
-rw-r--r--. 1 root root 132 Nov 21 01:51 tokenizer.json
-rw-r--r--. 1 root root 165938 Nov 21 01:51 tokenizer_config.json
-rw-r--r--. 1 root root 213618745 Apr 28 2025 vllm-0.8.5+cu118-cp38-abi3-manylinux1_x86_64.whl
[tangjianing@localhost ~]$ cd /data/tangjianing/.data_mapping/deepseek-ocr/
[tangjianing@localhost deepseek-ocr]$ ls -l
total 7440
drwxr-xr-x. 2 root root 4096 Nov 24 09:48 assets
drwxr-xr-x. 4 root root 4096 Nov 24 09:48 DeepSeek-OCR-master
-rw-r--r--. 1 root root 7591202 Nov 24 09:48 DeepSeek_OCR_paper.pdf
-rw-r--r--. 1 root root 1065 Nov 24 09:48 LICENSE
-rw-r--r--. 1 root root 7733 Nov 24 09:48 README.md
-rw-r--r--. 1 root root 93 Nov 24 09:48 requirements.txt
Specific solution
sudo vim docker-compose.yaml
- /data/tangjianing/.data_mapping/deepseek-ocr:/ubuntu-22.04/deepseek-ocr1
# Save and exit
:wq
sudo docker exec -it ubuntu-container-wyq /bin/bash
sudo docker exec -it ubuntu-container-wyq /bin/bash
cd /ubuntu-22.04/
mkdir deepseek-ocr1
# You can choose to delete the original docker
# rm -rf deepseek-ocr/
exit
docker compose up -d