导入需要的库

打开pycharm的终端（是终端不是python程序），下载下面的库

pip install torch transformers datasets peft accelerate sentencepiece modelscope 
pip install modelscope
pip3 install torch torchvision --index-url https://download.pytorch.org/whl/cu126

这里的第二行下载torch，如果你的独显cuda不匹配torch可以去官网查看你的显存是那个cuda版本适配哪个pytorch，如果没有独显可能不用在意这个问题

下载模型

建一个python程序，复制下面代码，下载模型

from modelscope.hub.snapshot_download import snapshot_download  # 自定义下载路径（可以是任意你有读写权限的目录）  
model_dir = snapshot_download(  'Qwen/Qwen3-0.6B',    revision='master',          cache_dir='./models'         # ← 自定义路径！  
)  print("模型保存路径：", model_dir)  #记住这里的路径，后面路径要用到

记住下载模型的路径

运行语句

再新建一个py文件,记得修改model_path为你电脑上模型的路径

from modelscope import AutoModelForCausalLM, AutoTokenizer  
import torch  #model_name = "Qwen/Qwen3-0.6B"  
model_path = "./models/qwen/Qwen3-0___6B"  # ← 修改成你电脑上的实际路径！  # 1. 加载 tokenizer 和 模型  
tokenizer = AutoTokenizer.from_pretrained(model_path, trust_remote_code=True)  
model = AutoModelForCausalLM.from_pretrained(  model_path,  device_map="auto",                    # 自动分配 GPU/CPU    dtype=torch.bfloat16,                 # 减少显存占用  trust_remote_code=True  
)  # prepare the model input  
prompt = "你好，请介绍一下你自己"  
messages = [  {"role": "user", "content": prompt}  
]  
text = tokenizer.apply_chat_template(  messages,  tokenize=False,  add_generation_prompt=True,  enable_thinking=False # Switches between thinking and non-thinking modes. Default is True.  
)  
model_inputs = tokenizer([text], return_tensors="pt").to(model.device)  # conduct text completion  
generated_ids = model.generate(  **model_inputs,  max_new_tokens=32768  
)  
output_ids = generated_ids[0][len(model_inputs.input_ids[0]):].tolist()  # parsing thinking content  
try:  # rindex finding 151668 (</think>)  index = len(output_ids) - output_ids[::-1].index(151668)  
except ValueError:  index = 0  thinking_content = tokenizer.decode(output_ids[:index], skip_special_tokens=True).strip("\n")  
content = tokenizer.decode(output_ids[index:], skip_special_tokens=True).strip("\n")  print("thinking content:", thinking_content)  
print("content:", content)

看到代码有输出就说明部署成功