昇腾社区首页
中文
注册
开发者
下载

执行安装命令

安装前必读

  • 使用MindCluster Ascend Deployer工具安装前,请确保MindCluster Ascend Deployer执行机的硬盘存储空间在16G以上。
  • 部分组件存在运行时依赖,如PyTorch需要toolkit或nnae提供运行时依赖;在Atlas A2 训练系列产品中,PyTorch需要安装kernels;MindCluster Ascend Deployer 6.0.0之前的版本,TensorFlow调用NPU资源需要tfplugin+toolkit或tfplugin+nnae组合提供运行时依赖,MindCluster Ascend Deployer 6.0.0及之后的版本,TensorFlow调用NPU资源需要toolkit或nnae提供运行时依赖;MindSpore安装需要nnae或toolkit,以及kernels软件包提供运行时依赖。

前提条件

  • 已完成软件包的下载
  • 安装用户为root用户,且具有install.sh的可执行权限。
  • 在待安装环境上准备运行用户。MindCluster Ascend Deployer执行安装命令时会自动创建默认运行用户HwHiAiUser,如用户需自行创建可参考手动创建运行用户

安装步骤

  1. 以软件包的安装用户登录待安装设备。
  2. 执行安装命令。
    • pip安装MindCluster Ascend Deployer时,在本机任意路径使用ascend-deployer命令进行安装。
      ascend-deployer --install=<package_name_1>,<package_name_2>     
    • 通过下载zip包解压使用MindCluster Ascend Deployer工具时,需进入ascend_deployer目录,使用bash install.sh命令安装。
      bash install.sh --install=<package_name_1>,<package_name_2>                

    命令示例如表1所示

    <package_name_x>可选范围参见支持安装及升级场景或通过执行bash install.sh --help查看全部可用参数。

    请按照“sys_pkg>python>npu>CANN、MindIEMindCluster(性能测试,故障诊断,集群调度)>deepseek_pd、deepseek_cntr”的顺序进行安装,安装时resources目录下的CANN包版本需和NPU配套。

    执行安装命令时MindCluster Ascend Deployer工具会默认检查环境是否满足安装的要求,若出现检查报错信息,请根据其信息评估是否使用--skip_check跳过检查继续安装。如示例所示。

    bash install.sh --install=<package_name_1>,<package_name_2>  --skip_check
    表1 安装命令示例

    安装类型

    安装命令

    系统环境初始化

    (安装sys_pkg)

    bash install.sh --install=sys_pkg                         #执行--install命令时,请勿重复安装sys_pkg。

    系统环境初始化

    (安装python)

    bash install.sh --install=python

    NPU固件与驱动

    (右侧命令任选其一执行)

    bash install.sh --install=npu
    bash install.sh --install=driver,firmware

    CANN软件

    (训练&推理&开发调试场景)

    bash install.sh --install=kernels,toolkit

    CANN软件

    (边缘推理场景)

    bash install.sh --install=nnrt,kernels

    CANN软件

    (训练&推理场景)

    bash install.sh --install=nnae,kernels

    MindCluster集群调度

    bash install.sh --install=ascend-device-plugin,ascend-docker-runtime,hccl-controller,noded,npu-exporter,volcano,ascend-operator,clusterd,resilience-controller

    MindCluster集群调度

    (MindIO)

    bash install.sh --install=mindio

    MindCluster性能测试

    bash install.sh --install=toolbox

    MindCluster故障诊断

    bash install.sh --install=fault-diag

    MindIE推理引擎

    bash install.sh --install=mindie_image

    部署DeepSeek PD实例

    bash install.sh --install=deepseek_pd

    Docker场景部署DeepSeek

    bash install.sh --install=deepseek_cntr

    说明

  3. (可选)涉及安装CANN和ToolBox时,用户需签署华为企业业务最终用户许可协议(EULA)后进入安装流程,根据回显页面执行y或Y确认协议,输入其他任意字符为拒绝协议,确认接受协议后自动开始安装。

    若当前语言环境不满足要求,可以执行如下命令配置系统的默认语言环境。

    • 配置为中文
      export LANG=zh_CN.UTF-8
    • 配置为英文
      export LANG=en_US.UTF-8
  4. (可选)涉及到安装驱动固件时,建议安装完成后立即重启。
    单机安装时重启请执行reboot命令。批量安装则执行以下命令重启所有设备。
    1. 如果MindCluster Ascend Deployer工具是部署在某一台待安装设备上,需要先在“inventory_file”屏蔽本机IP地址,如下加粗内容所示,否则执行4.b时可能还未发送重启命令到其他服务器,本机就已重启,导致其他服务器无法重启。如果MindCluster Ascend Deployer工具是部署在通用服务器可以跳过本步骤。
      #本机IP address ansible_ssh_user="root" # 屏蔽本机IP
    2. 重启服务器。
      ansible -i inventory_file all -m shell -a 'reboot'
    3. 其他待安装设备重启后,请在“inventory_file”中解除屏蔽的本机IP,然后执行reboot命令重启本机。
  5. (可选)部署DeepSeek PD实例时,MindIE的部署脚本将存放在[master][0]节点的如下路径中。
    /root/.ascend_deployer/mindie_pd/kubernetes_deploy_scripts/
    1. 若重新拉起DeepSeek PD实例,需进入MindIE部署脚本存放路径后,执行以下命令。
      # 使用Atlas 800I A2 推理产品时,对应命令如下
      source /usr/local/ascendrc && python3 deploy_ac_job.py
      
      # 使用Atlas 800I A3 超节点服务器时,对应命令如下
      source /usr/local/ascendrc && python3 deploy_ac_job.py --user_config_path user_config_base_A3.json
    2. 查看DeepSeek PD实例的状态,执行以下命令。
      kubectl get pod -A -owide | grep mindie # 将mindie替换成拉起服务时填写的job_id

      若pod状态显示为“Running”,即表示PD实例对应的pod节点已拉起,如下图所示。

    3. 执行如下命令进行一个问答测试,验证PD实例是否正常。
      # 将master节点IP替换成实际的IP
      
      curl http://master节点IP:31015/v1/chat/completions -X POST -d'{"model": "ds_r1","messages": [{"role": "user","content": "You are a helpful assistant."}],"stream": false,"presence_penalty": 1.03,"frequency_penalty": 1.0,"repetition_penalty": 1.0,"temperature": 0.5,"top_p": 0.95,"top_k": 1,"seed": 1,"max_tokens": 500}'

      等待大约1分钟后,如能正常回答如下加粗内容(每次回答的内容可能不同),表示实例状态正常。

      {"id":"******","object":"chat.completion","created":1747295960,"model":"dsv3_w8a8","choices":[{"index":0,"message":{"role":"assistant","tool_calls":null,"content":"<think>\nOkay, the user wants me to act as a helpful assistant. Let me start by understanding what that entails. Being helpful means providing accurate, clear, and concise information. I need to make sure I address their queries effectively without unnecessary fluff.\n\nFirst, I should consider different types of questions they might ask—factual, how-to, troubleshooting, etc. For each type, my approach might vary. Factual questions require quick answers with reliable sources. How-to guides need step-by-step instructions. Troubleshooting might involve asking follow-up questions to diagnose issues.\n\nI also need to be cautious about potential misunderstandings. If a question is ambiguous or unclear, it's better to ask for clarification rather than assume. That way, I can provide the most relevant assistance possible.\n\nAnother aspect is maintaining a friendly and approachable tone. Even though the user hasn't specified this explicitly since they mentioned \"helpful,\" which usually implies a positive interaction style.\n\nAdditionally, I should stay updated on current information if needed but remember my knowledge cutoff is October 2023. So if there's a time-sensitive question post that date, I should inform them about the limitation.\n\nLastly, ensuring responses are well-structured. Using bullet points or numbered lists when appropriate can enhance readability. Avoiding technical jargon unless necessary, and explaining terms if used.\n</think>\n\nHello! I'm here to help you with any questions, tasks, or information you need—just let me know how I can assist you today! Whether it's explaining concepts solving problems, offering advice, or brainstorming ideas, feel free to ask. ...
    4. 若需要删除DeepSeek PD实例,需进入MindIE部署脚本存放路径后,执行以下命令。
      bash delete.sh mindie # 将mindie替换成拉起服务时填写的job_id
  6. (可选)Docker场景部署DeepSeek时,MindIE Server的配置文件将存放如下路径中。
    /usr/local/Ascend/mindie/latest/mindie-service/conf/config.json
    1. 启动mindieservice_daemon服务化,进入容器内,执行以下命令。
      # source 下面对应的环境变量/usr/local/Ascend/mindie/latest/mindie-service/scripts/
      # 确保source完所有环境变量后执行如下命令启动服务化,需绑核拉起,可使用lscpu查询第一个NUMA的核数,典配800IA2服务器的第一个NUMA核数为0-31:
      taskset -c 0-31 /usr/local/Ascend/mindie/latest/mindie-service/bin/mindieservice_daemon
    2. 查看回显,出现“Daemon start success!”时表示服务拉起成功。

    3. 执行如下命令进行一个问答测试,验证推理服务是否正常。
      curl -X POST -d '{ "model":"DeepseekV3", "messages": [{ "role": "system", "content": "You are a helpful assistant." }], "max_tokens": 20, "stream": false }' http://xx.xx.xx.xx:1025/v1/chat/completions

      等待大约1分钟后,如能正常回答如下加粗内容(每次回答的内容可能不同),表示服务状态正常。

      {"id":"******","object":"chat.completion","created":1747295960,"model":"dsv3_w8a8","choices":[{"index":0,"message":{"role":"assistant","tool_calls":null,"content":"<think>\nOkay, the user wants me to act as a helpful assistant. Let me start by understanding what that entails. Being helpful means providing accurate, clear, and concise information. I need to make sure I address their queries effectively without unnecessary fluff.\n\nFirst, I should consider different types of questions they might ask—factual, how-to, troubleshooting, etc. For each type, my approach might vary. Factual questions require quick answers with reliable sources. How-to guides need step-by-step instructions. Troubleshooting might involve asking follow-up questions to diagnose issues.\n\nI also need to be cautious about potential misunderstandings. If a question is ambiguous or unclear, it's better to ask for clarification rather than assume. That way, I can provide the most relevant assistance possible.\n\nAnother aspect is maintaining a friendly and approachable tone. Even though the user hasn't specified this explicitly since they mentioned \"helpful,\" which usually implies a positive interaction style.\n\nAdditionally, I should stay updated on current information if needed but remember my knowledge cutoff is October 2023. So if there's a time-sensitive question post that date, I should inform them about the limitation.\n\nLastly, ensuring responses are well-structured. Using bullet points or numbered lists when appropriate can enhance readability. Avoiding technical jargon unless necessary, and explaining terms if used.\n</think>\n\nHello! I'm here to help you with any questions, tasks, or information you need—just let me know how I can assist you today! Whether it's explaining concepts solving problems, offering advice, or brainstorming ideas, feel free to ask. ...
    4. 若需要停止mindieservice_daemon服务,执行以下命令。
      pkill -9 -f "mindieservice_daemon"

查看安装报告及状态信息

安装完成后会在当前路径下生成report目录,包含安装报告report.csv和report.json文件。报告文件中以服务器为维度记录了服务器IP地址、状态等结果信息。

~/.ascend_deployer/deploy_info下会生成安装进度信息文件deployer_progress_output.json,查看安装过程和状态信息。

后续操作

检查安装结果