让DeepSeek系列蒸馏模型跑在香橙派上

MindIE（Mind Inference Engine，昇腾推理引擎）是华为昇腾针对AI全场景业务的推理加速套件。通过分层开放AI能力，支撑用户多样化的AI业务需求，使能百模千态，释放昇腾硬件设备算力。向上支持多种主流AI框架，向下对接不同类型昇腾AI处理器，提供多层次编程接口，帮助用户快速构建基于昇腾平台的推理业务。
DeepSeek-R1-Distill 模型是一系列通过蒸馏技术从 DeepSeek-R1 中提取知识的小型密集模型。这些模型保留了大型模型（如 DeepSeek-R1）的高级推理能力，同时具备更小的模型尺寸和高效的计算性能。这些蒸馏模型基于 Qwen2.5 和 Llama3 系列，这些系列本身在研究社区中已经得到了广泛的应用和认可。
香橙派的OrangePi AIpro开发板采用昇腾AI技术路线，无论在外观上、性能上还是技术服务支持上都非常优秀，提供20TOPS和8TOPS两种规格澎湃算力，能覆盖生态开发板者的主流应用场景，让用户实践各种创新场景，并为其提供配套的软硬件。
DeepSeek+香橙派+MindIE的结合，成功将DeepSeek-R1-Dstill-Qwen-1.5B、DeepSeek-R1-Dstill-Qwen-7B、DeepSeek-R1-Dstill-Llama-8B部署在AI开发板OrangePi AIpro（20T/24GDDR）上，让我们看到了边缘端在AI上的巨大潜力。

开源链接：
https://www.hiascend.com/software/modelzoo/models/detail/1ca4ef12682a42999efe09c0c80c76d3
https://www.hiascend.com/software/modelzoo/models/detail/fe210c6671554ecb84fd3a09051f0844
https://www.hiascend.com/software/modelzoo/models/detail/199a16c30e764c90aefefa1fd943f90

01 环境准备

硬件：OrangePi AIPro(20T/24GDDR)开发版一台、TF卡一张、TF 卡读卡器、屏幕连接线、显示器、开发板电源等。
首先，从官网下载开发板的Ubuntu22.04镜像和相关的资料。（http://www.orangepi.cn/html/hardWare/computerAndMicrocontrollers/service-and-support/Orange-Pi-AIpro(20T).html）

然后将TF卡插入读卡器中，打开镜像烧录软件balenaEtcher进行烧录，烧录完成后会显示Successful。

将烧录好的TF卡插入卡槽中，连接键盘、鼠标、显示屏并启动开发板。

02 安装python依赖

安装python3.10：

wget https://www.python.org/ftp/python/3.10.2/Python-3.10.2.tgztar -xvf Python-3.10.2.tgz -C /usr/local/sudo apt update sudo apt install -y build-essential zlib1g-dev libncurses5-dev libgdbm-dev libnss3-dev libssl-dev libreadline-dev libffi-dev libsqlite3-dev wget libbz2-devcd /usr/local/Python-3.10.2 ./configure --prefix=/usr/local/python3.10 make sudo make install# 可以通过运行以下命令来验证Python3.10是否已成功安装并配置为系统的默认Python版本python3.10 –version

安装使昇腾NPU可以适配PyTorch框架的插件torch_npu，下载链接：

pip install torch==2.1.0pip install ./torch_npu-2.1.0.post10-cp310-cp310-manylinux_2_17_aarch64.manylinux2014_aarch64.whl

03 安装CANN

升级昇腾异构计算架构（CANN）开发工具。下载cann-toolkit（https://mindie.obs.cn-north-4.myhuaweicloud.com/xiangchengpai_20250211/Ascend-cann-toolkit_8.1.RC1_linux-aarch64.run）、cann-kernels（https://mindie.obs.cn-north-4.myhuaweicloud.com/xiangchengpai_20250211/Ascend-cann-kernels-310b_8.1.RC1_linux-aarch64.run）、cann-nnal（https://mindie.obs.cn-north-4.myhuaweicloud.com/xiangchengpai_20250211/Ascend-cann-nnal_8.1.RC1_linux-aarch64.run）安装包，运行命令：

chmod +x Ascend-cann-toolkit_8.1.RC1_linux-aarch64.runchmod +x Ascend-cann-kernels-310b_8.1.RC1_linux-aarch64.run
chmod +x Ascend-cann-nnal_8.1.RC1_linux-aarch64.run
./Ascend-cann-toolkit_8.1.RC1_linux-aarch64.run --install --force
./Ascend-cann-kernels-310b_8.1.RC1_linux-aarch64.run --install
./Ascend-cann-nnal_8.1.RC1_linux-aarch64.run --install
source /usr/local/Ascend/ascend-toolkit/set_env.sh
source /usr/local/Ascend/nnal/atb/set_env.sh

04 安装MindIE

下载华为MindIE推理方案下的大语言模型推理组件MindIE-LLM，下载链接：（https://mindie.obs.cn-north-4.myhuaweicloud.com/xiangchengpai_20250211/Ascend-mindie-atb-models_2.0.RC1_linux-aarch64_py310_torch2.1.0-abi0.tar.gz）

mkdir MindIE-LLMcd MindIE-LLM
tar -zxf ../Ascend-mindie-atb-models_2.0.RC1_linux-aarch64_py310_torch2.1.0-abi0.tar.gz
pip install atb_llm-0.0.1-py3-none-any.whl
source set_env.sh

05 模型下载和部署

下载模型代码：

# 下载DeepSeek-R1-Dstill-Qwen-1.5Bgit clone https://modelers.cn/MindIE/DeepSeek-R1-Distill-Qwen-1.5B-OrangePi.git
# 下载DeepSeek-R1-Dstill-Qwen-7B
git clone https://modelers.cn/MindIE/DeepSeek-R1-Distill-Qwen-7B-OrangePi.git
# 下载DeepSeek-R1-Dstill-Llama-8B
git clone https://modelers.cn/MindIE/DeepSeek-R1-Distill-Llama-8B-OrangePi.git

安装依赖：

cd DeepSeek-R1-Distill-{model}-OrangePipip install -r ./requirements.txt

下载权重：
DeepSeek-R1-Distill-Qwen-1.5B（Int8）（https://modelers.cn/models/MindIE/DeepSeek-R1-Distill-Qwen-1.5B-OrangePi/tree/main/deepseek-qwen-1.5B-w8a8）
DeepSeek-R1-Distill-Qwen-1.5B（FP16）（https://huggingface.co/deepseek-ai/DeepSeek-R1-Distill-Qwen-1.5B）
DeepSeek-R1-Distill-Qwen-7B(FP16)（https://huggingface.co/deepseek-ai/DeepSeek-R1-Distill-Qwen-7B）
生成[DeepSeek-R1-Distill-Qwen-7B的INT8量化权重请参考README（https://gitee.com/ascend/ModelZoo-PyTorch/tree/master/MindIE/LLM/DeepSeek/DeepSeek-R1-Distill-Qwen-7B-OrangePi）中的“ 本地部署w8a8量化”章节
DeepSeek-R1-Distill-Llama-8B(FP16)（https://huggingface.co/deepseek-ai/DeepSeek-R1-Distill-Llama-8B）
生成[DeepSeek-R1-Distill-Llama-8B的INT8量化权重请参考README（https://gitee.com/ascend/ModelZoo-PyTorch/tree/master/MindIE/LLM/DeepSeek/DeepSeek-R1-Distill-Llama-8B-OrangePi）中的“ 本地部署w8a8量化”章节。
修改权重config.json，将torch_dtype字段改为float16，max_position_embedding字段改为4096。

06 执行推理

以上步骤完成之后即可在终端中输入问题进行测试：

cd $MindIE_LLM_PATHpython   -m examples.run_fa_edge \
         --model_path ${权重路径} \
         --input_text 'What is deep learning?' \
         --max_output_length 128 \
         --is_chat_model

命令行参数说明

--model_path 权重路径

--input_text 指定输入内容

--max_output_length 指定最大输出长度

--is_chat_model 请在执行FP16模型推理时开启此选项

本页内容

让DeepSeek系列蒸馏模型跑在香橙派上

01 环境准备

02 安装python依赖

03 安装CANN

04 安装MindIE

05 模型下载和部署

06 执行推理

命令行参数说明

关于昇腾

新闻与活动

交流与资讯

支持与服务

开源社区