当前位置: 首页 > news >正文

DeepSeek-OCR 模型的下载

前提,安装命令行工具 huggingface-cli 或者 hf 或 modelscope 或 aistudio 等命令行工具以及 git.

HF_ENDPOINT=https://hf-mirror.com hf download deepseek-ai/DeepSeek-OCR --cache-dir ~/.cache/huggingface/hub

HF_ENDPOINT=https://hf-mirror.com huggingface-cli download --resume-download deepseek-ai/DeepSeek-OCR
注意
pip install modelscope
modelscope download --model deepseek-ai/DeepSeek-OCR --local_dir /workspace/model

pip install --upgrade aistudio-sdk
aistudio download --model ModelHub/DeepSeek-OCR --local_dir ./

还可以通过git
git lfs install
然后
git clone https://www.modelscope.cn/deepseek-ai/DeepSeek-OCR.git

git clone https://cnb.cool/ai-models/deepseek-ai/DeepSeek-OCR /workspace/model

git clone https://git.aistudio.baidu.com/ModelHub/DeepSeek-OCR.git
如果能直连 github
git clone https://github.com/deepseek-ai/DeepSeek-OCR.git

将/workspace/model下的恢复到 ~/.huggingface/hub下,
HuggingFace 缓存目录结构

📍 路径: /root/.cache/huggingface/hub/models--deepseek-ai--DeepSeek-OCR

📁 /root/.cache/huggingface/hub/models--deepseek-ai--DeepSeek-OCR/
├── blobs/
│ ├── 04e5006bc8b904cca28c1c43286c81b5d00a5208f4b4f0de6357e3ff0e79ab4f (4.2KB)
│ ├── 0ae2fb6d1e5ae8cf100fc32f854830acd08c821a0a1f23a94a76588c222ddcf2 (37.1KB)
│ ├── 0fe7ba9aa6b967a90e4af40d43c8030cbdd3dbcfbb387b8907625d5e54f0dbbe (460B)
│ ├── 1169e7cdc28ff2fb6186556acb2175db148ad26a62097df4c45a17e523180d3f (6.2GB)
│ ├── 2fe88eacc470c34d00225151372d3770948864f3d9cfaae16afa15b2432d7793 (262.5KB)
│ ├── 3a709eb3fdb51cf6d8a546ffb8efe3c80bef61ae0183c2c8476a4c4a41efa3f1 (386.8KB)
│ ├── 5835e0b9e1942fe36df9123009d1d80a2b78ccbe6a2d535668a6561e3d4068b1 (39.2KB)
│ ├── 69184a0493d1fdac21c360f9f9eabc5b5af3188e0265df44cfaeafc44f9095e3 (80.3KB)
│ ├── 6ab21f29a4722e26fa28c8e0d4277591689a598df17cf6c712330e8f62b3fc7c (10.4KB)
│ ├── 81d08f7f33d9d39b95dd9b8162506659e6822d621b9829a208f3830c34c2b4d0 (210.9KB)
│ ├── 887e88e60e5833bc10a2cd7edb89ea7e6992abaae5e1550b027c611b8b8456f2 (114.1KB)
│ ├── a02f8fd5228c90256bb4f6554c34a579d48f909e5beb232dc4afad870b55a8b4 (9.5MB)
│ ├── a0cbe8464049da1f891b7a12676de06af4cb54c130995d42f71adc1c30c6e9f3 (162.0KB)
│ ├── ab4bd57ce17d62e39e0a39e739de1e407484f090f0b2c7e391312bca7a5b061a (801B)
│ ├── ac358c28898de6a71e84e0cb959fcc97fcb9c674edae13cd24286c17b90032d8 (2.6KB)
│ ├── b51d17cc7b282880006a162b06e3a5c4c0a566f3d3093f34a93450ca5a91bd74 (241.0KB)
│ ├── cd24b0cfc7b6c0b1b34bd1aa55bc385e746298fdd82410db6c0d4e0bf69085c0 (241.3KB)
│ ├── ec7b6ce89bcda643de1f43269ffa66a7b2e65dc3ed30e427958f776546b4ba03 (9.0KB)
│ └── f2c6c602815669d292889e5be8c802f2ed950653b77999b1584e8e6aed25d040 (1.1KB)
├── refs/
│ └── main (40B)
└── snapshots/
└── 9f30c71f441d010e5429c532364a86705536c53a/
├── .ipynb_checkpoints/
│ └── README-checkpoint.md → ../../../blobs/04e5006bc8b904cca28c1c43286c81b5d00a5208f4b4f0de6357e3ff0e79ab4f
├── LICENSE → ../../blobs/f2c6c602815669d292889e5be8c802f2ed950653b77999b1584e8e6aed25d040
├── README.md → ../../blobs/04e5006bc8b904cca28c1c43286c81b5d00a5208f4b4f0de6357e3ff0e79ab4f
├── assets/
│ ├── fig1.png → ../../../blobs/3a709eb3fdb51cf6d8a546ffb8efe3c80bef61ae0183c2c8476a4c4a41efa3f1
│ ├── show1.jpg → ../../../blobs/887e88e60e5833bc10a2cd7edb89ea7e6992abaae5e1550b027c611b8b8456f2
│ ├── show2.jpg → ../../../blobs/81d08f7f33d9d39b95dd9b8162506659e6822d621b9829a208f3830c34c2b4d0
│ ├── show3.jpg → ../../../blobs/cd24b0cfc7b6c0b1b34bd1aa55bc385e746298fdd82410db6c0d4e0bf69085c0
│ └── show4.jpg → ../../../blobs/2fe88eacc470c34d00225151372d3770948864f3d9cfaae16afa15b2432d7793
├── config.json → ../../blobs/ac358c28898de6a71e84e0cb959fcc97fcb9c674edae13cd24286c17b90032d8
├── configuration_deepseek_v2.py → ../../blobs/6ab21f29a4722e26fa28c8e0d4277591689a598df17cf6c712330e8f62b3fc7c
├── conversation.py → ../../blobs/ec7b6ce89bcda643de1f43269ffa66a7b2e65dc3ed30e427958f776546b4ba03
├── deepencoder.py → ../../blobs/0ae2fb6d1e5ae8cf100fc32f854830acd08c821a0a1f23a94a76588c222ddcf2
├── model-00001-of-000001.safetensors → ../../blobs/1169e7cdc28ff2fb6186556acb2175db148ad26a62097df4c45a17e523180d3f
├── model.safetensors.index.json → ../../blobs/b51d17cc7b282880006a162b06e3a5c4c0a566f3d3093f34a93450ca5a91bd74
├── modeling_deepseekocr.py → ../../blobs/5835e0b9e1942fe36df9123009d1d80a2b78ccbe6a2d535668a6561e3d4068b1
├── modeling_deepseekv2.py → ../../blobs/69184a0493d1fdac21c360f9f9eabc5b5af3188e0265df44cfaeafc44f9095e3
├── processor_config.json → ../../blobs/0fe7ba9aa6b967a90e4af40d43c8030cbdd3dbcfbb387b8907625d5e54f0dbbe
├── special_tokens_map.json → ../../blobs/ab4bd57ce17d62e39e0a39e739de1e407484f090f0b2c7e391312bca7a5b061a
├── tokenizer.json → ../../blobs/a02f8fd5228c90256bb4f6554c34a579d48f909e5beb232dc4afad870b55a8b4
└── tokenizer_config.json → ../../blobs/a0cbe8464049da1f891b7a12676de06af4cb54c130995d42f71adc1c30c6e9f3

================================================================================

目录结构说明

================================================================================

📁 blobs/ - 实际的文件内容,文件名是文件内容的SHA256哈希
📁 snapshots// - blobs中文件的符号链接,按commit组织
📁 refs/ - 引用文件,main指向最新的commit

🔹 模型总大小: ~6.3GB (safetensors文件)
🔹 所有其他文件都是小文件(KB级别)
🔹 符号链接使用相对路径,便于迁移

💾 缓存总大小: 6.23 GB

#!/usr/bin/env python3
"""
将git克隆的模型文件恢复到HuggingFace缓存格式
"""import os
import sys
import hashlib
import shutil
from pathlib import Pathdef compute_sha256(file_path):"""计算文件的SHA256哈希值"""sha256_hash = hashlib.sha256()with open(file_path, "rb") as f:for chunk in iter(lambda: f.read(4096), b""):sha256_hash.update(chunk)return sha256_hash.hexdigest()def restore_to_huggingface_cache(source_dir, target_repo):"""将git克隆的模型文件恢复到HuggingFace缓存格式Args:source_dir: git克隆的模型目录 (如 /workspace/model)target_repo: HuggingFace缓存目录名 (如 models--deepseek-ai--DeepSeek-OCR)"""# 设置路径hub_cache = Path("/root/.cache/huggingface/hub")target_path = hub_cache / target_repoblobs_path = target_path / "blobs"snapshots_path = target_path / "snapshots"print("=" * 60)print("Restoring Git Cloned Model to HuggingFace Cache")print("=" * 60)print(f"\nSource: {source_dir}")print(f"Target: {target_path}")print("=" * 60)# 创建目录blobs_path.mkdir(parents=True, exist_ok=True)snapshots_path.mkdir(parents=True, exist_ok=True)# 创建refs/main文件(包含最新的commit hash)refs_dir = target_path / "refs"refs_dir.mkdir(exist_ok=True)# 默认使用一个commit hash(这里用文件名作为示例)commit_hash = "9f30c71f441d010e5429c532364a86705536c53a"(refs_dir / "main").write_text(commit_hash)# 创建snapshot目录snapshot_dir = snapshots_path / commit_hashsnapshot_dir.mkdir(exist_ok=True)print("\nProcessing files...")print("-" * 60)source_path = Path(source_dir)processed_files = []# 处理所有文件(除了.git目录)for file_path in source_path.rglob("*"):if not file_path.is_file():continueif ".git" in str(file_path):continue# 计算相对路径rel_path = file_path.relative_to(source_path)# 计算SHA256print(f"Processing: {rel_path}")sha256 = compute_sha256(file_path)# 复制到blobs目录(使用SHA256作为文件名)blob_file = blobs_path / sha256if not blob_file.exists():shutil.copy2(file_path, blob_file)print(f"  → Blob saved: {blob_file.name}")# 在snapshot目录创建symlink(相对路径)snapshot_file = snapshot_dir / rel_pathsnapshot_file.parent.mkdir(parents=True, exist_ok=True)# 计算相对路径rel_to_snapshot = os.path.relpath(blob_file, snapshot_file.parent)# 如果已存在,删除它if snapshot_file.exists():snapshot_file.unlink()# 创建symlinkos.symlink(rel_to_snapshot, snapshot_file)print(f"  → Symlink created: {snapshot_file}")processed_files.append(str(rel_path))print("\n" + "=" * 60)print("✅ Restoration completed successfully!")print("=" * 60)print(f"\nTotal files processed: {len(processed_files)}")print(f"\nYou can now use:")print(f"  from transformers import AutoModel")print(f'  model = AutoModel.from_pretrained("deepseek-ai/DeepSeek-OCR")')print("=" * 60)return target_pathif __name__ == "__main__":# 配置路径SOURCE_DIR = "/workspace/model"  # git克隆的目录TARGET_REPO = "models--deepseek-ai--DeepSeek-OCR"  # HuggingFace缓存目录名try:result_path = restore_to_huggingface_cache(SOURCE_DIR, TARGET_REPO)print(f"\n✅ Files restored to: {result_path}")except KeyboardInterrupt:print("\n\n⚠️  Process interrupted by user")sys.exit(130)except Exception as e:print(f"\n❌ Error: {e}")import tracebacktraceback.print_exc()sys.exit(1)

参考
https://huggingface.co/deepseek-ai/DeepSeek-OCR
https://hf-mirror.com/
http://github.com/deepseek-ai/DeepSeek-OCR
https://aistudio.baidu.com/modelsdetail/38538/space
https://www.modelscope.cn/models/deepseek-ai/DeepSeek-OCR

https://cnb.cool/ai-models/deepseek-ai/DeepSeek-OCR

http://www.zskr.cn/news/76093.html

相关文章:

  • 2025散热风扇厂家实力排行榜:万航电子以智能温控技术领跑,六家高潜力本土品牌深度解析
  • AI 清洁管理系统:响应 3 秒,人力成本降低 42%
  • virtualbox+ubuntu+vscode+ssh pwn环境配置
  • 2025砂面粉厂家实力榜:思洛尔新材料以纳米级球形蜡粉领跑,六家高潜力国产技术代表企业深度解析
  • 2025防水织带厂家实力榜:东莞市永沣织带以创新飞织技术领跑,六大高潜力本土品牌核心优势深度解析
  • 《密码系统设计》第十二周预习报告
  • 2025广东懒人全自动酿酒设备实力榜:六家本土技术代表企业,以智能蒸汽与不锈钢工艺领跑行业深度解析
  • FortiGuard 应用控制服务更新:新版本特性与签名变动
  • 服装人体工效学
  • 2025广东泽洋金属材料实力榜:七大不锈钢型号深度解析,301至316L精密合金引领行业革新
  • 2025防静电地板厂家实力排行榜:广东立品以六面包钢技术领跑,七大高潜力品类深度解析
  • 2025东莞蔚林服饰毛衣厂家实力榜:六家羊绒针织技术代表企业,小香风与高领长款男女童装深度解析
  • windows c++ 程序的编译分析
  • AI元人文的奠基性架构:论“意义行为原生”理论与数字文明价值操作系统
  • 三项神经突破变革机器人学习
  • 2025铁氟龙高温线厂家实力榜:明秀电子以极细线径技术领跑,六家高潜力本土品牌深度解析
  • 图解IIS8上解决ASP.Net第一次访问慢的处理
  • 11 ORM关联表、事务
  • 2025 AI 搜索品牌监测工具选型指南:主流系统推荐与排名洞察解决方案
  • 有名的汽配车间通风降温工业冷风机源头厂家,工厂降温车间/生产车间通风降温/敞开式车间通风降温/工厂车间降温通风工业冷风机厂家哪家好
  • 2025高压加速老化试验箱实力榜:东莞伟煌以创新热流仪技术领跑,六家国产精密设备厂商深度解析
  • 第十五节:对账详解
  • Ubuntu 下使用 Wine 工具实现 QQ、微信、WinRar 和百度网盘的使用
  • 物联网设备多物理场耦合环境下的自适应参数动态调控技术 - 教程
  • Unity 和 Unity Hub 下载的 unitypackage 的保存位置
  • windows使用.bat文件启动jar - 华
  • 2025工业除尘设备厂家实力榜:东莞市百谊环保科技以高效脉冲技术领跑,六家核心本土品牌优势深度解析
  • nvm切换node.js版本 - 华
  • 2025东莞皓富电子实力榜:防水DC插座与耳机插座六家创新技术代表企业核心优势深度解析
  • python题库 No.26 城市整理