昇腾社区首页
中文
注册
昇腾服务器/昇腾开发板部署DeepSeek-32Bw8a8+Dify知识库应用

昇腾服务器/昇腾开发板部署DeepSeek-32Bw8a8+Dify知识库应用

DeepSeek

发表于 2025/03/12

第一步:先向昇腾方申请设备,申请到Atlas 800 9000服务器,使用昇腾官方提供的账号和密码保证可以登录上服务器;

(1)更新一下驱动,因为昇腾官方的提供的镜像需要指定版本的驱动固件,下载安装更新 Version: 23.0.rc2将会变更为Version: 23.0.0,下载地址:社区版-固件与驱动-昇腾社区

更新安装固件,并更新固件,重启设备,一切以昇腾官方的最新驱动和公告为准

[root@dify HwHiAiUser]# pwd

/home/HwHiAiUser

[root@dify HwHiAiUser]# ls -l

total 131112

-rw------- 1 root root 134251528 Dec  7 16:16 Ascend-hdk-910-npu-driver_23.0.0_linux-aarch64.run

[root@dify HwHiAiUser]# chmod 777 Ascend-hdk-910-npu-driver_23.0.0_linux-aarch64.run

[root@dify HwHiAiUser]# ls

Ascend-hdk-910-npu-driver_23.0.0_linux-aarch64.run

[root@dify HwHiAiUser]# sudo ./Ascend-hdk-910-npu-driver_23.0.0_linux-aarch64.run --full --force

Verifying archive integrity...  100%   SHA256 checksums are OK. All good.

Uncompressing ASCEND DRIVER RUN PACKAGE  100%

[Driver] [2025-02-23 15:46:26] [INFO]Start time: 2025-02-23 15:46:26

[Driver] [2025-02-23 15:46:26] [INFO]LogFile: /var/log/ascend_seclog/ascend_install.log

[Driver] [2025-02-23 15:46:26] [INFO]OperationLogFile: /var/log/ascend_seclog/operation.log

[Driver] [2025-02-23 15:46:26] [INFO]base version is 23.0.rc2.

[Driver] [2025-02-23 15:46:26] [WARNING]Do not power off or restart the system during the installation/upgrade

[Driver] [2025-02-23 15:46:26] [INFO]set username and usergroup, HwHiAiUser:HwHiAiUser

[Driver] [2025-02-23 15:46:26] [INFO]Driver package has been installed on the path /usr/local/Ascend, the version is 23.0.rc2, and the version of this package is 23.0.0,do you want to continue?  [y/n]

y

[Driver] [2025-02-23 15:46:36] [INFO]driver install type: Direct

[Driver] [2025-02-23 15:46:36] [INFO]upgradePercentage:10%

[Driver] [2025-02-23 15:46:40] [INFO]upgradePercentage:30%

[Driver] [2025-02-23 15:46:40] [INFO]upgradePercentage:40%

[Driver] [2025-02-23 15:46:42] [INFO]upgradePercentage:90%

[Driver] [2025-02-23 15:46:45] [INFO]upgradePercentage:100%

[Driver] [2025-02-23 15:46:45] [INFO]Driver package installed successfully! Reboot needed for installation/upgrade to take effect!

[Driver] [2025-02-23 15:46:45] [INFO]End time: 2025-02-23 15:46:45

[root@dify HwHiAiUser]# sudo reboot

固件更新完成,查看驱动版本为Version: 23.0.0

(2)将基础模型先下载下来,一会进行挂载推理模型,分词模型、到排序模型,进行使用 ,可以去魔搭社区下载ModelScope魔搭社区,先下载模型:DeepSeek-R1-Distill-Qwen-32B ,下载使用方式参考官方指导方式即可;

使用python脚本下载模型

[root@dify HwHiAiUser]# pwd
/home/HwHiAiUser
[root@dify HwHiAiUser]# pip3 install modelscope==1.18.0 -i https://mirrors.tuna.tsinghua.edu.cn/pypi/web/simple
[root@dify HwHiAiUser]# python3
Python 3.7.0 (default, May 11 2024, 10:32:14)
[GCC 7.3.0] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> import modelscope
>>> exit()
[root@dify HwHiAiUser]# cat down.py
#模型下载
from modelscope import snapshot_download
model_dir = snapshot_download('deepseek-ai/DeepSeek-R1-Distill-Qwen-32B',cache_dir=".")
[root@dify HwHiAiUser]# python3 down.py
Downloading [figures/benchmark.jpg]: 100%|██████████████████████████████████████████████████████████████████████| 759k/759k [00:00<00:00, 1.78MB/s]
Downloading [config.json]: 100%|██████████████████████████████████████████████████████████████████████████████████| 664/664 [00:00<00:00, 2.10kB/s]
Downloading [configuration.json]: 100%|███████████████████████████████████████████████████████████████████████████| 73.0/73.0 [00:00<00:00, 233B/s]
Downloading [generation_config.json]: 100%|█████████████████████████████████████████████████████████████████████████| 181/181 [00:00<00:00, 686B/s]
Downloading [LICENSE]: 100%|██████████████████████████████████████████████████████████████████████████████████| 1.04k/1.04k [00:00<00:00, 2.92kB/s]
Downloading [model-00001-of-000008.safetensors]: 0%| | 1.00M/8.19G [00:00<59:21, 2.47MB/s]Downloading [model-00001-of-000008.safetensors]: 0%| | 16.0M/8.19G [00:00<03:43, 39.3MB/s]

下载完成,查看权重目录

[root@dify HwHiAiUser]# pwd

/home/HwHiAiUser

[root@dify HwHiAiUser]# tree -L 2

.

├── Ascend-hdk-910-npu-driver_23.0.0_linux-aarch64.run

├── deepseek-ai

│   ├── DeepSeek-R1-Distill-Qwen-32B

│  

└── down.py

3 directories, 3 files

二、使用官方镜像 昇腾镜像仓库详情,进行昇腾MindIE环境构建,因为计划测试DeepSeek-R1-Distill-Qwen-32B-W8A8模型,所以记得创建容器挂载两张卡即可

(1)拉取Atals 800 9000镜像,建议从官方拉取,自己要根据自己的机型拉取对应的镜像,一切以官方为主,青岛的镜像也是在官方镜像上,打包做过细微不影响运行的修改

也可以从下面的公开链接拉取镜像,创建双卡容器

[root@dify HwHiAiUser]#yum install docker

[root@dify HwHiAiUser]# docker pull swr.cn-east-317.qdrgznjszx.com/sxj731533730/mindie:atlas_800_9000
Error response from daemon: Get https://swr.cn-east-317.qdrgznjszx.com/v2/: x509: certificate signed by unknown authority
[root@dify HwHiAiUser]#


修改配置源,添加mindie的镜像源;

解决办法:
[root@dify HwHiAiUser]#vim /etc/docker/daemon.json
填入内容
{ "insecure-registries": ["https://swr.cn-east-317.qdrgznjszx.com"/], "registry-mirrors": ["https://docker.mirrors.ustc.edu.cn"/] }
保存退出、然后重启docker即可
[root@dify HwHiAiUser]# systemctl restart docker.service

[root@dify HwHiAiUser]# docker pull swr.cn-east-317.qdrgznjszx.com/sxj731533730/mindie:atlas_800_9000
atlas_800_9000: Pulling from qd-aicc/mindie
edab87ea811e: Pull complete
72906c864c93: Pull complete
98f62a370e96: Pull complete
Digest: sha256:6ceefe4506f58084717ec9bed7df75e51032fdd709d791a627084fe4bd92abea
Status: Downloaded newer image for swr.cn-east-317.qdrgznjszx.com/qd-aicc/mindie:atlas_800_9000
[root@dify HwHiAiUser]#

创建容器,进入容器,计划使用两张昇腾NPU卡推理DeepSeek-R1-Distill-Qwen-32B的W8A8模型,所以构建的容器用两张卡,选6、7卡吧,0-6号卡可以跑文本嵌入模型、重排序模型;创建容器脚本

[root@dify ~]# cd /home/HwHiAiUser/

[root@dify HwHiAiUser]# ls

Ascend-hdk-910-npu-driver_23.0.0_linux-aarch64.run  deepseek-ai  down.py

[root@dify HwHiAiUser]# docker images

REPOSITORY                                      TAG                                                                                IMAGE ID            CREATED             SIZE

swr.cn-east-317.qdrgznjszx.com/sxj731533730/mindie                                           atlas_800_9000   69f30d0c15be        5 weeks ago         16.5GB

[root@dify HwHiAiUser]# vim docker_run.sh

[root@dify HwHiAiUser]# vim docker_run.sh

[root@dify HwHiAiUser]# vim docker_run.sh

[root@dify HwHiAiUser]# cat docker_run.sh

#!/bin/bash

docker_images=swr.cn-east-317.qdrgznjszx.com/sxj731533730/mindie:atlas_800_9000

model_dir=/home/HwHiAiUser #根据实际情况修改挂载目录

docker run -it --name qdaicc --ipc=host --net=host \

        --device=/dev/davinci6 \

        --device=/dev/davinci7 \

        --device=/dev/davinci_manager \

        --device=/dev/devmm_svm \

        --device=/dev/hisi_hdc \

        -v /usr/local/dcmi:/usr/local/dcmi \

        -v /usr/local/bin/npu-smi:/usr/local/bin/npu-smi \

        -v /usr/local/Ascend/driver/lib64/common:/usr/local/Ascend/driver/lib64/common \

        -v /usr/local/Ascend/driver/lib64/driver:/usr/local/Ascend/driver/lib64/driver \

        -v /etc/ascend_install.info:/etc/ascend_install.info \

        -v /etc/vnpu.cfg:/etc/vnpu.cfg \

        -v /usr/local/Ascend/driver/version.info:/usr/local/Ascend/driver/version.info \

        -v ${model_dir}:${model_dir} \

        -v  /var/log/npu:/usr/slog ${docker_images} \

        /bin/bash

[root@dify HwHiAiUser]#

填进去内容如上,启动镜像

[root@dify HwHiAiUser]# bash docker_run.sh
(Python310) root@dify:/usr/local/Ascend/atb-models# cd /home/HwHiAiUser/
(Python310) root@dify:/home/HwHiAiUser# ls
Ascend-hdk-910-npu-driver_23.0.0_linux-aarch64.run  deepseek-ai  docker_run.sh  down.py

因为之前挂在的目录是 /home/HwHiAiUser/ ,所以可以在docker里面看到物理机的下载权重,再查看一下卡数是两张

(2)进行模型量化Ascend/ModelZoo-PyTorch - Gitee.com 直接进入量化阶段,在容器外面操作即可,环境不用管,因为系统已经默认配置了环境,直接跳到 权重量化 阶段,安装过程缺什么,,在docker外面git下源码,进入容器内部进行量化,这里的容器建议在创建个8卡的容器,双卡容器量化会显示npu显存不够,除非你用cpu转模型,我就懒得创建容器了,使用cpu量化吧;

[root@dify HwHiAiUser]# pwd
/home/HwHiAiUser
[root@dify HwHiAiUser]# git clone https://gitee.com/ascend/msit.git
Cloning into 'msit'...
remote: Enumerating objects: 81125, done.
remote: Total 81125 (delta 0), reused 0 (delta 0), pack-reused 81125
Receiving objects: 100% (81125/81125), 71.73 MiB | 12.14 MiB/s, done.
Resolving deltas: 100% (59704/59704), done.
[root@dify HwHiAiUser]# cd msit/
.git/ .gitee/ msit/ msmodelslim/ msserviceprofiler/
[root@dify Qwen]# docker start b5399c4da202
b5399c4da202
[root@dify Qwen]# docker exec -it b5399c4da202 /bin/bash
(Python310) root@dify:/home/HwHiAiUser/msit# cd msmodelslim/
(Python310) root@dify:/home/HwHiAiUser/msit/msmodelslim# bash install.sh
#安装成功,pip缺啥安装啥
(Python310) root@dify:/home/HwHiAiUser# cd /home/HwHiAiUser/msit/msmodelslim/example/Qwen
#量化模型
(Python310) root@dify:/home/HwHiAiUser/msit/msmodelslim/example/Qwen# python3 quant_qwen.py --model_path /home/HwHiAiUser/deepseek-ai/DeepSeek-R1-Distill-Qwen-32B/ --save_directory /home/HwHiAiUser/deepseek-ai/DeepSeek-R1-Distill-Qwen-32B-W8A8 --calib_file ../common/boolq.jsonl --w_bit 8 --a_bit 8 --device_type npu
2025-02-23 18:15:25,404 - msmodelslim-logger - WARNING - The current CANN version does not support LayerSelector quantile method.
或者cpu处理
(Python310) root@dify:/home/HwHiAiUser/msit/msmodelslim/example/Qwen# python3 quant_qwen.py --model_path /home/HwHiAiUser/deepseek-ai/DeepSeek-R1-Distill-Qwen-32B/ --save_directory /home/HwHiAiUser/deepseek-ai/DeepSeek-R1-Distill-Qwen-32B-W8A8 --calib_file ../common/boolq.jsonl --w_bit 8 --a_bit 8 --device_type cpu
2025-02-23 18:25:10,776 - msmodelslim-logger - WARNING - The current CANN version does not support LayerSelector quantile method.
2025-02-23 18:25:10,783 - msmodelslim-logger - WARNING - `cpu` is set as `dev_type`, `dev_id` cannot be specified manually!

转换完成之后生成权重文件

(Python310) root@dify:/home/HwHiAiUser/deepseek-ai# cd /home/HwHiAiUser/msit/msmodelslim/example/Qwen

(Python310) root@dify:/home/HwHiAiUser/msit/msmodelslim/example/Qwen# ls /home/HwHiAiUser/deepseek-ai/

DeepSeek-R1-Distill-Qwen-32B  DeepSeek-R1-Distill-Qwen-32B-W8A8

(Python310) root@dify:/home/HwHiAiUser/msit/msmodelslim/example/Qwen#

因为Atlas 800 9000不支持bf16,所以修改float16,其它设备参考昇腾手册

(Python310) root@dify:/home/HwHiAiUser/msit/msmodelslim/example/Qwen# vim /home/HwHiAiUser/deepseek-ai/DeepSeek-R1-Distill-Qwen-32B-W8A8/config.json

(3)启动MindIE服务,先记录本机的ip地址,模型路径和以及模型名字

模型路径权重: /home/HwHiAiUser/deepseek-ai/DeepSeek-R1-Distill-Qwen-32B-W8A8/

模型名字: DeepSeek-R1-Distill-Qwen-32B-W8A8

修改配置文件

(Python310) root@dify:/usr/local/Ascend/mindie/latest/mindie-service# pwd

/usr/local/Ascend/mindie/latest/mindie-service

(Python310) root@dify:/usr/local/Ascend/mindie/latest/mindie-service# vim conf/config.json

修改解释一下,ipAddress,主要为了后面搭建dify使用的推理引擎模型,其它参考mindie手册

MindSpore Models服务化使用-MindSpore Models使用-模型推理使用流程-MindIE LLM开发指南-大模型开发-MindIE1.0.0开发文档-昇腾社区

单机推理-配置MindIE Server-配置MindIE-MindIE安装指南-环境准备-MindIE1.0.0开发文档-昇腾社区

"ipAddress" : "192.168.1.115", 改为本地地址

"httpsEnabled" : false,

"npuDeviceIds" : [[0,1]],

"modelName" : "DeepSeek-R1-Distill-Qwen-32B-W8A8",

"modelWeightPath" : "/home/HwHiAiUser/deepseek-ai/DeepSeek-R1-Distill-Qwen-32B-W8A8/",

"maxInputTokenLen" : 4096,

"maxIterTimes" : 4096,

"truncation" : true,

修改内容如下

(Python310) root@dify:/usr/local/Ascend/mindie/latest/mindie-service# cat conf/config.json

{

    "Version" : "1.0.0",

    "LogConfig" :

    {

        "logLevel" : "Info",

        "logFileSize" : 20,

        "logFileNum" : 20,

        "logPath" : "logs/mindie-server.log"

    },

    "ServerConfig" :

    {

        "ipAddress" : "192.168.1.115",

        "managementIpAddress" : "127.0.0.2",

        "port" : 1025,

        "managementPort" : 1026,

        "metricsPort" : 1027,

        "allowAllZeroIpListening" : false,

        "maxLinkNum" : 1000,

        "httpsEnabled" : false,

        "fullTextEnabled" : false,

        "tlsCaPath" : "security/ca/",

        "tlsCaFile" : ["ca.pem"],

        "tlsCert" : "security/certs/server.pem",

        "tlsPk" : "security/keys/server.key.pem",

        "tlsPkPwd" : "security/pass/key_pwd.txt",

        "tlsCrlPath" : "security/certs/",

        "tlsCrlFiles" : ["server_crl.pem"],

        "managementTlsCaFile" : ["management_ca.pem"],

        "managementTlsCert" : "security/certs/management/server.pem",

        "managementTlsPk" : "security/keys/management/server.key.pem",

        "managementTlsPkPwd" : "security/pass/management/key_pwd.txt",

        "managementTlsCrlPath" : "security/management/certs/",

        "managementTlsCrlFiles" : ["server_crl.pem"],

        "kmcKsfMaster" : "tools/pmt/master/ksfa",

        "kmcKsfStandby" : "tools/pmt/standby/ksfb",

        "inferMode" : "standard",

        "interCommTLSEnabled" : true,

        "interCommPort" : 1121,

        "interCommTlsCaPath" : "security/grpc/ca/",

        "interCommTlsCaFiles" : ["ca.pem"],

        "interCommTlsCert" : "security/grpc/certs/server.pem",

        "interCommPk" : "security/grpc/keys/server.key.pem",

        "interCommPkPwd" : "security/grpc/pass/key_pwd.txt",

        "interCommTlsCrlPath" : "security/grpc/certs/",

        "interCommTlsCrlFiles" : ["server_crl.pem"],

        "openAiSupport" : "vllm"

    },

    "BackendConfig" : {

        "backendName" : "mindieservice_llm_engine",

        "modelInstanceNumber" : 1,

        "npuDeviceIds" : [[0,1]],

        "tokenizerProcessNumber" : 8,

        "multiNodesInferEnabled" : false,

        "multiNodesInferPort" : 1120,

        "interNodeTLSEnabled" : true,

        "interNodeTlsCaPath" : "security/grpc/ca/",

        "interNodeTlsCaFiles" : ["ca.pem"],

        "interNodeTlsCert" : "security/grpc/certs/server.pem",

        "interNodeTlsPk" : "security/grpc/keys/server.key.pem",

        "interNodeTlsPkPwd" : "security/grpc/pass/mindie_server_key_pwd.txt",

        "interNodeTlsCrlPath" : "security/grpc/certs/",

        "interNodeTlsCrlFiles" : ["server_crl.pem"],

        "interNodeKmcKsfMaster" : "tools/pmt/master/ksfa",

        "interNodeKmcKsfStandby" : "tools/pmt/standby/ksfb",

        "ModelDeployConfig" :

        {

            "maxSeqLen" : 2560,

            "maxInputTokenLen" : 4096,

            "truncation" : true,

            "ModelConfig" : [

                {

                    "modelInstanceType" : "Standard",

                    "modelName" : "DeepSeek-R1-Distill-Qwen-32B-W8A8",

                    "modelWeightPath" : "/home/HwHiAiUser/deepseek-ai/DeepSeek-R1-Distill-Qwen-32B-W8A8/",

                    "worldSize" : 2,

                    "cpuMemSize" : 5,

                    "npuMemSize" : -1,

                    "backendType" : "atb",

                    "trustRemoteCode" : false

                }

            ]

        },

        "ScheduleConfig" :

        {

            "templateType" : "Standard",

            "templateName" : "Standard_LLM",

            "cacheBlockSize" : 128,

            "maxPrefillBatchSize" : 50,

            "maxPrefillTokens" : 8192,

            "prefillTimeMsPerReq" : 150,

            "prefillPolicyType" : 0,

            "decodeTimeMsPerReq" : 50,

            "decodePolicyType" : 0,

            "maxBatchSize" : 200,

            "maxIterTimes" : 4096,

            "maxPreemptCount" : 0,

            "supportSelectBatch" : false,

            "maxQueueDelayMicroseconds" : 5000

        }

    }

}

修改模型权限,启动服务

(Python310) root@dify:/usr/local/Ascend/mindie/latest/mindie-service# chmod -R 750 /home/HwHiAiUser/deepseek-ai/DeepSeek-R1-Distill-Qwen-32B-W8A8/

(Python310) root@dify:/usr/local/Ascend/mindie/latest/mindie-service# ./bin/mindieservice_daemon

Special tokens have been added in the vocabulary, make sure the associated word embeddings are fine-tuned or trained.

[2025-02-23 19:04:44,279] [89160] [281464373506464] [llm] [INFO][logging.py-227] : Skip binding cpu.

Special tokens have been added in the vocabulary, make sure the associated word embeddings are fine-tuned or trained.

Special tokens have been added in the vocabulary, make sure the associated word embeddings are fine-tuned or trained.

Special tokens have been added in the vocabulary, make sure the associated word embeddings are fine-tuned or trained.

Special tokens have been added in the vocabulary, make sure the associated word embeddings are fine-tuned or trained.

Special tokens have been added in the vocabulary, make sure the associated word embeddings are fine-tuned or trained.

Special tokens have been added in the vocabulary, make sure the associated word embeddings are fine-tuned or trained.

Special tokens have been added in the vocabulary, make sure the associated word embeddings are fine-tuned or trained.

Special tokens have been added in the vocabulary, make sure the associated word embeddings are fine-tuned or trained.

Daemon start success!

重启一个终端,查看npu使用状况

本机测试
[root@dify ~]# curl -H "Accept: application/json" -H "Content-type: application/json" -X POST -d '{"inputs":"如何赚大钱","parameters":{"decoder_input_details":true,"details":true,"do_sample":true,"max_new_tokens":50,"repetition_penalty":1.03,"return_full_text":false,"seed":null,"temperature":0.5,"top_k":10,"top_p":0.95,"truncate":null,"typical_p":0.5,"watermark":false}}' http://192.168.1.115:1025/generate
{"details":{"prompt_tokens":5,"finish_reason":"length","generated_tokens":50,"prefill":[{"id":151646,"logprob":null,"special":null,"text":null},{"id":100007,"logprob":null,"special":null,"text":null},{"id":102223,"logprob":null,"special":null,"text":null},{"id":26288,"logprob":null,"special":null,"text":null},{"id":99428,"logprob":null,"special":null,"text":null}],"seed":2240260787,"tokens":[{"id":26850,"logprob":null,"special":null,"text":null},{"id":100007,"logprob":null,"special":null,"text":null},{"id":102223,"logprob":null,"special":null,"text":null},{"id":26288,"logprob":null,"special":null,"text":null},{"id":99428,"logprob":null,"special":null,"text":null},{"id":11319,"logprob":null,"special":null,"text":null},{"id":1406,"logprob":null,"special":null,"text":null},{"id":151649,"logprob":null,"special":null,"text":null},{"id":271,"logprob":null,"special":null,"text":null},{"id":102223,"logprob":null,"special":null,"text":null},{"id":26288,"logprob":null,"special":null,"text":null},{"id":99428,"logprob":null,"special":null,"text":null},{"id":102119,"logprob":null,"special":null,"text":null},{"id":85106,"logprob":null,"special":null,"text":null},{"id":100374,"logprob":null,"special":null,"text":null},{"id":99605,"logprob":null,"special":null,"text":null},{"id":9370,"logprob":null,"special":null,"text":null},{"id":101139,"logprob":null,"special":null,"text":null},{"id":5373,"logprob":null,"special":null,"text":null},{"id":85329,"logprob":null,"special":null,"text":null},{"id":33108,"logprob":null,"special":null,"text":null},{"id":99345,"logprob":null,"special":null,"text":null},{"id":101135,"logprob":null,"special":null,"text":null},{"id":1773,"logprob":null,"special":null,"text":null},{"id":87752,"logprob":null,"special":null,"text":null},{"id":99639,"logprob":null,"special":null,"text":null},{"id":97084,"logprob":null,"special":null,"text":null},{"id":102716,"logprob":null,"special":null,"text":null},{"id":39907,"logprob":null,"special":null,"text":null},{"id":48443,"logprob":null,"special":null,"text":null},{"id":14374,"logprob":null,"special":null,"text":null},{"id":220,"logprob":null,"special":null,"text":null},{"id":16,"logprob":null,"special":null,"text":null},{"id":13,"logprob":null,"special":null,"text":null},{"id":3070,"logprob":null,"special":null,"text":null},{"id":99716,"logprob":null,"special":null,"text":null},{"id":102447,"logprob":null,"special":null,"text":null},{"id":1019,"logprob":null,"special":null,"text":null},{"id":256,"logprob":null,"special":null,"text":null},{"id":481,"logprob":null,"special":null,"text":null},{"id":3070,"logprob":null,"special":null,"text":null},{"id":104023,"logprob":null,"special":null,"text":null},{"id":5373,"logprob":null,"special":null,"text":null},{"id":100025,"logprob":null,"special":null,"text":null},{"id":334,"logprob":null,"special":null,"text":null},{"id":5122,"logprob":null,"special":null,"text":null},{"id":67338,"logprob":null,"special":null,"text":null},{"id":101930,"logprob":null,"special":null,"text":null},{"id":99716,"logprob":null,"special":null,"text":null},{"id":101172,"logprob":null,"special":null,"text":null}]},"generated_text":"?\n\n如何赚大钱?\n\n\n</think>\n\n赚大钱通常需要结合个人的技能、资源和市场机会。以下是一些常见的方法:\n\n### 1. **投资理财**\n - **股票、基金**:通过长期投资优质"}[root@dify ~]#

三、启动分词服务和重排序服务,首先去昇腾仓下载镜像 昇腾镜像仓库详情, 对应自己的设备查找镜像

(1)拉取镜像Atlas 800 9000,已经要根据自己的硬件版本去官方仓拉取镜像,进行分词服务启动

[root@dify ~]# docker pull swr.cn-east-317.qdrgznjszx.com/sxj731533730/mis-tei:6.0.RC3-910-aarch64

[root@dify ~]# docker images

REPOSITORY                                            TAG                                                                                IMAGE ID            CREATED             SIZE

swr.cn-east-317.qdrgznjszx.com/sxj731533730/mis-tei   6.0.RC3-910-aarch64                                                                affece68b209        2 days ago          22.6GB

swr.cn-east-317.qdrgznjszx.com/sxj731533730/mindie        atlas_800_9000   69f30d0c15be        5 weeks ago         16.5GB

[root@dify ~]#

拉取完镜像之后,进行必要的权重模型下载

[root@dify ~]# cd /home/HwHiAiUser/

[root@dify HwHiAiUser]# pwd

/home/HwHiAiUser

[root@dify HwHiAiUser]# vim down.py

[root@dify HwHiAiUser]# cat down.py

#模型下载

from modelscope import snapshot_download

model_dir = snapshot_download('BAAI/bge-m3',cache_dir=".")

from modelscope import snapshot_download

model_dir = snapshot_download('BAAI/bge-large-zh-v1.5',cache_dir=".")

from modelscope import snapshot_download

model_dir = snapshot_download('BAAI/bge-reranker-large',cache_dir=".")

[root@dify HwHiAiUser]# python3 down.py

下载完模型,修改每一个模型内部的配置项 Atlas800 9000/300I Duo/300V Pro设备,Atlas 800T A2等设备不用走该步骤

[root@dify HwHiAiUser]# ls

Ascend-hdk-910-npu-driver_23.0.0_linux-aarch64.run  BAAI  deepseek-ai  docker_run.sh  down.py  msit

[root@dify HwHiAiUser]# vim BAAI/bge-large-zh-v1___5/config.json

[root@dify HwHiAiUser]# vim BAAI/bge-m3/config.json

[root@dify HwHiAiUser]# vim BAAI/bge-reranker-large/config.json

 "torch_dtype": "float16",

(2)创建三个容器,暂定容器名字是 bge-m3、bge-large-zh-v1___5、bge-reranker-large,在创建之前,需要联系昇腾技术人员,开通服务器对外端口,暂定开通的为8001,8002,8003 和niginx转发端口-入方向:|出方向:TCP/8001,8002,8003,8004,442

将模型拷贝到/home/data下,参考官方手册来即可

[root@dify ~]# cd /home/HwHiAiUser/

[root@dify HwHiAiUser]# ls

Ascend-hdk-910-npu-driver_23.0.0_linux-aarch64.run  BAAI  deepseek-ai  docker_run.sh  down.py  msit

[root@dify HwHiAiUser]# pwd

/home/HwHiAiUser

[root@dify HwHiAiUser]# mkdir -p /home/data

[root@dify HwHiAiUser]# cp -r BAAI/* /home/data/

[root@dify HwHiAiUser]# ls /home/data/

bge-large-zh-v1___5  bge-m3  bge-reranker-large

[root@dify HwHiAiUser]#

参考官方说明:

ASCEND_VISIBLE_DEVICES环境变量表示将宿主机上的npu卡挂载到容器,如果挂载多张卡使用逗号分隔,如:ASCEND_VISIBLE_DEVICES=0,1,2,3;挂载多张卡到容器时,默认会寻找最优的一张卡调用,如果不希望容器内部自动寻找最优的卡,启动容器时可通过TEI_NPU_DEVICE=卡id指定使用哪张卡,注意这里的变量TEI_NPU_DEVICE配置从0开始取,容器内已将外部卡id进行了逻辑映射,编号从0连续映射;注意:配置的ASCEND_VISIBLE_DEVICES对应的卡不能被其他容器已挂载,否则会报错

[root@dify ~]# docker run -u root -e TEI_NPU_DEVICE=0 -itd --name=bge-reranker-large --net=host -e HOME=/home/HwHiAiUser --privileged=true  -v /home/data:/home/HwHiAiUser/model -v /usr/local/bin/npu-smi:/usr/local/bin/npu-smi -v /usr/local/Ascend/driver:/usr/local/Ascend/driver --entrypoint /home/HwHiAiUser/start.sh swr.cn-east-317.qdrgznjszx.com/sxj731533730/mis-tei:6.0.RC3-910-aarch64  BAAI/bge-reranker-large 192.168.1.115 8001

ef2383785c58ec5a650eb9d852ba965c48eb7b8cc7679cb7c194d2f2d0eb1a0d

[root@dify ~]# docker start ef2383785c58ec5a650eb9d852ba965c48eb7b8cc7679cb7c194d2f2d0eb1a0d

ef2383785c58ec5a650eb9d852ba965c48eb7b8cc7679cb7c194d2f2d0eb1a0d

[root@dify ~]# docker run -u root -e TEI_NPU_DEVICE=1 -itd --name=bge-m3 --net=host -e HOME=/home/HwHiAiUser --privileged=true  -v /home/data:/home/HwHiAiUser/model -v /usr/local/bin/npu-smi:/usr/local/bin/npu-smi -v /usr/local/Ascend/driver:/usr/local/Ascend/driver --entrypoint /home/HwHiAiUser/start.sh swr.cn-east-317.qdrgznjszx.com/sxj731533730/mis-tei:6.0.RC3-910-aarch64  BAAI/bge-m3 192.168.1.115 8002

50dd3573f1ae1363211791425a2f681445b220f5a45bbdbe572a361ce974f63a

[root@dify ~]# docker start 50dd3573f1ae1363211791425a2f681445b220f5a45bbdbe572a361ce974f63a

50dd3573f1ae1363211791425a2f681445b220f5a45bbdbe572a361ce974f63a

bge-large-zh-v1___5  bge-m3  bge-reranker-large

[root@dify ~]# docker run -u root -e TEI_NPU_DEVICE=2 -itd --name=bge-large-zh-v1___5 --net=host -e HOME=/home/HwHiAiUser --privileged=true  -v /home/data:/home/HwHiAiUser/model -v /usr/local/bin/npu-smi:/usr/local/bin/npu-smi -v /usr/local/Ascend/driver:/usr/local/Ascend/driver --entrypoint /home/HwHiAiUser/start.sh swr.cn-east-317.qdrgznjszx.com/sxj731533730/mis-tei:6.0.RC3-910-aarch64  BAAI/bge-large-zh-v1___5 192.168.1.115 8003

d360f2b558c6556af53e19abd9f0782600f8cab1a7c60dc90fcf0b6061511c96

[root@dify ~]# docker start d360f2b558c6556af53e19abd9f0782600f8cab1a7c60dc90fcf0b6061511c96

d360f2b558c6556af53e19abd9f0782600f8cab1a7c60dc90fcf0b6061511c96

查看一下三个服务,两个分词,一个排序模型,当然也可以放在一个NPU上运行编辑

记录一下对外的服务端口 mindie推理服务 192.168.1.115:1025  ;bge-reranker-large服务:192.168.1.115:8001 bge-m3服务:192.168.1.115:8002 bge-large-zh-v1___5服务: 192.168.1.115:8003

四、部署dify环境进行部署配置,部署遇到的最大问题就是昇腾架构使用的aarch64,gitee使用docker镜像容器是x86_64,所以找镜像替代即可

(1)拉取dify的源码

[root@dify HwHiAiUser]# git clone https://gitee.com/dify_ai/dify.git
Cloning into 'dify'...
remote: Enumerating objects: 206836, done.
remote: Counting objects: 100% (10350/10350), done.
remote: Compressing objects: 100% (5418/5418), done.
remote: Total 206836 (delta 6559), reused 7867 (delta 4637), pack-reused 196486
Receiving objects: 100% (206836/206836), 80.47 MiB | 3.03 MiB/s, done.
Resolving deltas: 100% (161147/161147), done.
[root@dify HwHiAiUser]# cd dify

[root@dify dify]# git checkout 0.15.3
Note: checking out '0.15.3'.

You are in 'detached HEAD' state. You can look around, make experimental
changes and commit them, and you can discard any commits you make in this
state without impacting any branches by performing another checkout.

If you want to create a new branch to retain commits you create, you may
do so (now or later) by using -b with the checkout command again. Example:

git checkout -b <new-branch-name>

HEAD is now at ca19bd31d chore(*): Bump version to 0.15.3 (#13308)

[root@dify HwHiAiUser]# cd docker/
[root@dify docker]# cp .env.example .env
[root@dify docker]# vim .env

修改848行、906行

NGINX_PORT=80
# SSL settings are only applied when HTTPS_ENABLED is true
NGINX_SSL_PORT=443
修改

NGINX_PORT=8004
# SSL settings are only applied when HTTPS_ENABLED is true
NGINX_SSL_PORT=442
另一处

EXPOSE_NGINX_PORT=80
EXPOSE_NGINX_SSL_PORT=443
修改
EXPOSE_NGINX_PORT=8004
EXPOSE_NGINX_SSL_PORT=442

修改配置文件

[root@dify docker]# vim docker-compose.yaml

第486行添加 --ignore-warnings ARM64-COW-BUG

将492行 修改0.2.10修改为0.2.1

(2)下载docker-compose,配置工具

sudo curl -L https://github.com/docker/compose/releases/download/v2.33.0/docker-compose-linux-aarch64 -o /usr/local/bin/docker-compose
或者这样下载
[root@dify docker]# cd /usr/local/bin/
[root@dify bin]# pwd
/usr/local/bin
[root@dify bin]# wget https://sxj731533730.obs.cn-east-317.qdrgznjszx.com/docker-compose
--2025-02-25 21:07:54-- https://sxj731533730.obs.cn-east-317.qdrgznjszx.com/docker-compose
Resolving sxj731533730.obs.cn-east-317.qdrgznjszx.com (sxj731533730.obs.cn-east-317.qdrgznjszx.com)... 100.125.32.125
Connecting to sxj731533730.obs.cn-east-317.qdrgznjszx.com (sxj731533730.obs.cn-east-317.qdrgznjszx.com)|100.125.32.125|:443... connected.
HTTP request sent, awaiting response... 200 OK
Length: 71778465 (68M) [application/octet-stream]
Saving to: ‘docker-compose’

docker-compose 100%[=====================================================================>] 68.45M 220MB/s in 0.3s

2025-02-25 21:07:54 (220 MB/s) - ‘docker-compose’ saved [71778465/71778465]

[root@dify bin]# ls
cloud-id cloud-init-per jsondiff jsonpointer modelscope npu-healthcheck.sh tqdm
cloud-init docker-compose jsonpatch jsonschema normalizer npu-smi
[root@dify bin]# chmod 777 docker-compose
[root@dify bin]# docker-compose -v
Docker Compose version v2.33.0

 (3)拉取镜像,准备启动dify环境,根据。yaml找aarch64位库即可

docker pull swr.cn-north-4.myhuaweicloud.com/ddn-k8s/docker.io/langgenius/dify-api:0.15.3-linuxarm64

docker tag  swr.cn-north-4.myhuaweicloud.com/ddn-k8s/docker.io/langgenius/dify-api:0.15.3-linuxarm64  docker.io/langgenius/dify-api:0.15.3

docker pull swr.cn-north-4.myhuaweicloud.com/ddn-k8s/docker.io/langgenius/dify-web:0.15.3-linuxarm64

docker tag  swr.cn-north-4.myhuaweicloud.com/ddn-k8s/docker.io/langgenius/dify-web:0.15.3-linuxarm64  docker.io/langgenius/dify-web:0.15.3

docker pull swr.cn-north-4.myhuaweicloud.com/ddn-k8s/docker.io/postgres:15-alpine-linuxarm64

docker tag  swr.cn-north-4.myhuaweicloud.com/ddn-k8s/docker.io/postgres:15-alpine-linuxarm64  docker.io/postgres:15-alpine

docker pull swr.cn-north-4.myhuaweicloud.com/ddn-k8s/docker.io/redis:6-alpine-linuxarm64

docker tag  swr.cn-north-4.myhuaweicloud.com/ddn-k8s/docker.io/redis:6-alpine-linuxarm64  docker.io/redis:6-alpine

docker pull swr.cn-north-4.myhuaweicloud.com/ddn-k8s/docker.io/langgenius/dify-sandbox:0.2.10-linuxarm64

docker tag  swr.cn-north-4.myhuaweicloud.com/ddn-k8s/docker.io/langgenius/dify-sandbox:0.2.10-linuxarm64  docker.io/langgenius/dify-sandbox:0.2.10

docker pull swr.cn-north-4.myhuaweicloud.com/ddn-k8s/docker.io/langgenius/dify-sandbox:0.2.1-linuxarm64

docker tag  swr.cn-north-4.myhuaweicloud.com/ddn-k8s/docker.io/langgenius/dify-sandbox:0.2.1-linuxarm64  docker.io/langgenius/dify-sandbox:0.2.1

docker pull swr.cn-north-4.myhuaweicloud.com/ddn-k8s/docker.io/ubuntu/squid:latest-linuxarm64

docker tag  swr.cn-north-4.myhuaweicloud.com/ddn-k8s/docker.io/ubuntu/squid:latest-linuxarm64  docker.io/ubuntu/squid:latest

docker pull swr.cn-north-4.myhuaweicloud.com/ddn-k8s/docker.io/certbot/certbot:v3.1.0-linuxarm64

docker tag  swr.cn-north-4.myhuaweicloud.com/ddn-k8s/docker.io/certbot/certbot:v3.1.0-linuxarm64  docker.io/certbot/certbot:latest

docker pull swr.cn-north-4.myhuaweicloud.com/ddn-k8s/docker.io/nginx:latest-linuxarm64

docker tag  swr.cn-north-4.myhuaweicloud.com/ddn-k8s/docker.io/nginx:latest-linuxarm64  docker.io/nginx:latest

docker pull swr.cn-north-4.myhuaweicloud.com/ddn-k8s/docker.io/pingcap/tidb:v8.4.0-linuxarm64

docker tag  swr.cn-north-4.myhuaweicloud.com/ddn-k8s/docker.io/pingcap/tidb:v8.4.0-linuxarm64  docker.io/pingcap/tidb:v8.4.0

docker pull swr.cn-north-4.myhuaweicloud.com/ddn-k8s/docker.io/semitechnologies/weaviate:1.19.0-linuxarm64

docker tag  swr.cn-north-4.myhuaweicloud.com/ddn-k8s/docker.io/semitechnologies/weaviate:1.19.0-linuxarm64  docker.io/semitechnologies/weaviate:1.19.0

docker pull swr.cn-north-4.myhuaweicloud.com/ddn-k8s/docker.io/langgenius/qdrant:v1.7.3-linuxarm64

docker tag  swr.cn-north-4.myhuaweicloud.com/ddn-k8s/docker.io/langgenius/qdrant:v1.7.3-linuxarm64  docker.io/langgenius/qdrant:v1.7.3

docker pull swr.cn-north-4.myhuaweicloud.com/ddn-k8s/docker.io/pgvector/pgvector:pg16-linuxarm64

docker tag  swr.cn-north-4.myhuaweicloud.com/ddn-k8s/docker.io/pgvector/pgvector:pg16-linuxarm64  docker.io/pgvector/pgvector:pg16

docker pull swr.cn-north-4.myhuaweicloud.com/ddn-k8s/docker.io/tensorchord/pgvecto-rs:pg16-v0.3.0-linuxarm64

docker tag  swr.cn-north-4.myhuaweicloud.com/ddn-k8s/docker.io/tensorchord/pgvecto-rs:pg16-v0.3.0-linuxarm64  docker.io/tensorchord/pgvecto-rs:pg16-v0.3.0

docker pull swr.cn-north-4.myhuaweicloud.com/ddn-k8s/ghcr.io/chroma-core/chroma:0.5.20-linuxarm64

docker tag  swr.cn-north-4.myhuaweicloud.com/ddn-k8s/ghcr.io/chroma-core/chroma:0.5.20-linuxarm64  ghcr.io/chroma-core/chroma:0.5.20

docker pull swr.cn-north-4.myhuaweicloud.com/ddn-k8s/quay.io/oceanbase/oceanbase-ce:4.3.3.0-100000142024101215-linuxarm64

docker tag  swr.cn-north-4.myhuaweicloud.com/ddn-k8s/quay.io/oceanbase/oceanbase-ce:4.3.3.0-100000142024101215-linuxarm64  quay.io/oceanbase/oceanbase-ce:4.3.3.0-100000142024101215

docker pull swr.cn-north-4.myhuaweicloud.com/ddn-k8s/docker.io/container-registry.oracle.com/database/free:latest-linuxarm64

docker tag  swr.cn-north-4.myhuaweicloud.com/ddn-k8s/docker.io/container-registry.oracle.com/database/free:latest-linuxarm64  docker.io/container-registry.oracle.com/database/free:latest

docker pull swr.cn-north-4.myhuaweicloud.com/ddn-k8s/quay.io/coreos/etcd:v3.5.5-linuxarm64

docker tag  swr.cn-north-4.myhuaweicloud.com/ddn-k8s/quay.io/coreos/etcd:v3.5.5-linuxarm64  quay.io/coreos/etcd:v3.5.5

docker pull swr.cn-north-4.myhuaweicloud.com/ddn-k8s/docker.io/minio/minio:RELEASE.2023-03-20T20-16-18Z-linuxarm64

docker tag  swr.cn-north-4.myhuaweicloud.com/ddn-k8s/docker.io/minio/minio:RELEASE.2023-03-20T20-16-18Z-linuxarm64  docker.io/minio/minio:RELEASE.2023-03-20T20-16-18Z

docker pull swr.cn-north-4.myhuaweicloud.com/ddn-k8s/docker.io/milvusdb/milvus:v2.5.0-beta-linuxarm64

docker tag  swr.cn-north-4.myhuaweicloud.com/ddn-k8s/docker.io/milvusdb/milvus:v2.5.0-beta-linuxarm64  docker.io/milvusdb/milvus:v2.5.0-beta

docker pull swr.cn-north-4.myhuaweicloud.com/ddn-k8s/docker.io/opensearchproject/opensearch:latest-linuxarm64

docker tag  swr.cn-north-4.myhuaweicloud.com/ddn-k8s/docker.io/opensearchproject/opensearch:latest-linuxarm64  docker.io/opensearchproject/opensearch:latest

docker pull swr.cn-north-4.myhuaweicloud.com/ddn-k8s/docker.io/opensearchproject/opensearch-dashboards:latest-linuxarm64

docker tag  swr.cn-north-4.myhuaweicloud.com/ddn-k8s/docker.io/opensearchproject/opensearch-dashboards:latest-linuxarm64  docker.io/opensearchproject/opensearch-dashboards:latest

docker pull swr.cn-north-4.myhuaweicloud.com/ddn-k8s/docker.io/myscale/myscaledb:1.6.4-linuxarm64

docker tag  swr.cn-north-4.myhuaweicloud.com/ddn-k8s/docker.io/myscale/myscaledb:1.6.4-linuxarm64  docker.io/myscale/myscaledb:1.6.4

docker pull swr.cn-north-4.myhuaweicloud.com/ddn-k8s/docker.elastic.co/elasticsearch/elasticsearch:8.14.3-linuxarm64

docker tag  swr.cn-north-4.myhuaweicloud.com/ddn-k8s/docker.elastic.co/elasticsearch/elasticsearch:8.14.3-linuxarm64  docker.elastic.co/elasticsearch/elasticsearch:8.14.3

docker pull swr.cn-north-4.myhuaweicloud.com/ddn-k8s/docker.elastic.co/kibana/kibana:8.14.3-linuxarm64

docker tag  swr.cn-north-4.myhuaweicloud.com/ddn-k8s/docker.elastic.co/kibana/kibana:8.14.3-linuxarm64  docker.elastic.co/kibana/kibana:8.14.3

docker pull swr.cn-north-4.myhuaweicloud.com/ddn-k8s/docker.io/robwilkes/unstructured-api:latest-linuxarm64

docker tag  swr.cn-north-4.myhuaweicloud.com/ddn-k8s/docker.io/robwilkes/unstructured-api:latest-linuxarm64  docker.io/robwilkes/unstructured-api:latest

然后启动dify成功

[root@dify HwHiAiUser]# cd dify/

[root@dify dify]# cd docker

[root@dify docker]# pwd

/home/HwHiAiUser/dify/docker

[root@dify docker]# docker-compose up -d

[+] Running 11/11

 ✔ Network docker_default  Created                                                                                                

 ✔ Network docker_ssrf_proxy_network Created                                                                                               

 ✔ Container docker-sandbox-1    Started                                                                                                

 ✔ Container docker-redis-1   Started                                                                                                

 ✔ Container docker-web-1   Started                                                                                      

 ✔ Container docker-weaviate-1        Started                                                                                                

 ✔ Container docker-db-1  Started                                                                                              1.

 ✔ Container docker-ssrf_proxy-1  Started                                                                                                

 ✔ Container docker-api-1    Started                                                                                                

 ✔ Container docker-worker-1  Started                                                                                                

 ✔ Container docker-nginx-1  Started                                                                                                

[root@dify docker]#

后台启动成功

五、启动dify进行配置界面,在地址栏输入http://ip(访问服务器的ip地址):8084端口,可以刷新出dify界面

注册一下,这个是所有者权限,只能注册一次,无法修改,如果修改,需要重新拉dify服务

使用所有者权限进入账户,点击右边的设置

选择模型供应商

在下面的列表中找到这两个配置项

添加第一个模型deepseek

OpenAI-API-compatible
类型选LLM 模型名字对应你的mindie的name:DeepSeek-R1-Distill-Qwen-32B-W8A8 mindie的URL:http://192.168.1.115:1025/v1 只要后台服务启动中,前端可以保存,就是ok,秘钥随意填

Text Embedding Inference
然后配置排序模型和分词模型,支持RAG,秘钥随便写,只要后台服务启动中,前端可以保存,就是ok
1.1 选择 RERANK URL设置 http://192.168.1.115:8001/ 模型名 :bge-reranker-large
1.2 选择 TEXT EMBEDDING URL设置 http://192.168.1.115:8002/ 模型名 : bge-large-zh-v1___5
1.3 选择 TEXT EMBEDDING URL设置 http://192.168.1.115:8003/ 模型名 : bge-m3
六、实际测试,跑在昇腾上面的DeepSeek-R1-Distill-Qwen-32B-W8A8 双卡

测试知识库RAG,看一下知识库的内容

开始处理文本


测试不挂知识库结果

测试挂知识库结果

邮箱分发功能,需要修改源码 ,修改源码,从邮箱拿到秘钥,重启服务

[root@wuzhoutuili-0003 docker]# vim ../api/tasks/mail_invite_member_task.py
[root@wuzhoutuili-0003 docker]# pwd
/home/HwHiAiUser/dify/docker





邀约邮件

埋个彩蛋,敬请期待 昇腾服务器部署one-api+fastgpt,内测中