Upgrading Container Manager

Directly replace the Container Manager binary file to upgrade the component on a physical machine.

  1. Log in to the node where Container Manager is deployed as the root user.
  2. Upload the obtained Container Manager package to any directory on the server (for example, /tmp/container-manager).
  3. Go to the /tmp/container-manager directory and decompress the package.
    unzip Ascend-mindxdl-container-manager_{version}_linux-{arch}.zip

    <version> indicates the package version, and <arch> indicates the CPU architecture.

  4. Run the following commands in sequence to upgrade Container Manager:
    # Stop the Container Manager service and delete the corresponding Container Manager binary file.
    systemctl stop container-manager.service
    chattr -i /usr/local/bin/container-manager
    rm -f /usr/local/bin/container-manager
    
    # Retrieve the new binary file from the decompressed package and replace the existing Container Manager binary file.
    cp /tmp/container-manager/container-manager /usr/local/bin
    chmod 500 /usr/local/bin/container-manager
    
    # Restart the Container Manager service.
    systemctl daemon-reload
    systemctl start container-manager.service
  5. Verify the upgrade status of Container Manager.
    1. Check the component service status, which should be active (running).
      systemctl status container-manager.service

      Command output:

       1
       2
       3
       4
       5
       6
       7
       8
       9
      10
       container-manager.service - Ascend container manager
           Loaded: loaded (/etc/systemd/system/container-manager.service; disabled; vendor preset: enabled)
           Active: active (running) since Wed 2025-11-26 20:56:50 UTC; 16s ago
          Process: 41459 ExecStart=/bin/bash -c container-manager run  -ctrStrategy ringRecover -logPath=/var/log/mindx-dl/container-manager/container-manager.log >/dev/null 2>&1 & (code=exited, status=0/SUCCESS)
         Main PID: 41464 (container-manag)
            Tasks: 10 (limit: 629145)
           Memory: 13.3M
           CGroup: /system.slice/container-manager.service
                   └─41464 /home/container-manager/container-manager run -ctrStrategy ringRecover
      ...
      
    2. View component logs.
      cat /var/log/mindx-dl/container-manager/container-manager.log

      Command output (Atlas 800I A3 SuperPoD Server as an example):

      1
      2
      3
      4
      5
      6
      7
      [INFO]     2025/11/25 22:46:59.007163 1       hwlog/api.go:108    container-manager.log's logger init success
      [INFO]     2025/11/25 22:46:59.007288 1       command/run.go:150    init log success
      [INFO]     2025/11/25 22:46:59.007506 1       devmanager/devmanager.go:134    get card list from dcmi reset timeout is 60
      [INFO]     2025/11/25 22:46:59.250103 1       devmanager/devmanager.go:142    deviceManager get cardList is [0 1 2 3 4 5 6 7], cardList length equal to cardNum: 8
      [INFO]     2025/11/25 22:46:59.250267 1       devmanager/devmanager.go:171    the dcmi version is 25.5.0.b030
      [INFO]     2025/11/25 22:46:59.250405 1       devmanager/devmanager.go:235    chipName: Ascend910, devType: Ascend910A3
      ...
      

      If the following information is displayed, the component is running properly:

      ...
      [INFO]     2025/11/25 22:46:59.289352 1       devmgr/workflow.go:57    init module <hwDev manager> success
      [INFO]     2025/11/25 22:46:59.293773 1       app/config.go:40    load fault config from /home/faultCode.json success
      [INFO]     2025/11/25 22:46:59.293866 1       app/workflow.go:50    init module <fault manager> success
      [INFO]     2025/11/25 22:46:59.293901 1       app/workflow.go:76    init module <container controller> success
      [INFO]     2025/11/25 22:46:59.293930 1       app/workflow.go:64    init module <reset-manager> success
      [INFO]     2025/11/25 22:46:59.315101 378     devmgr/hwdevmgr.go:365    subscribe device fault event success
      ...