Upgrading Container Manager

Directly replace the Container Manager binary file to upgrade the component on a physical machine.

Log in to the node where Container Manager is deployed as the root user.
Upload the obtained Container Manager package to any directory on the server (for example, /tmp/container-manager).
Go to the /tmp/container-manager directory and decompress the package.
```
unzip Ascend-mindxdl-container-manager_{version}_linux-{arch}.zip
```
<version> indicates the package version, and <arch> indicates the CPU architecture.

Run the following commands in sequence to upgrade Container Manager:

# Stop the Container Manager service and delete the corresponding Container Manager binary file.
systemctl stop container-manager.service
chattr -i /usr/local/bin/container-manager
rm -f /usr/local/bin/container-manager

# Retrieve the new binary file from the decompressed package and replace the existing Container Manager binary file.
cp /tmp/container-manager/container-manager /usr/local/bin
chmod 500 /usr/local/bin/container-manager

# Restart the Container Manager service.
systemctl daemon-reload
systemctl start container-manager.service

Verify the upgrade status of Container Manager.

Check the component service status, which should be active (running).

systemctl status container-manager.service

Command output:

● container-manager.service - Ascend container manager
     Loaded: loaded (/etc/systemd/system/container-manager.service; disabled; vendor preset: enabled)
     Active: active (running) since Wed 2025-11-26 20:56:50 UTC; 16s ago
    Process: 41459 ExecStart=/bin/bash -c container-manager run  -ctrStrategy ringRecover -logPath=/var/log/mindx-dl/container-manager/container-manager.log >/dev/null 2>&1 & (code=exited, status=0/SUCCESS)
   Main PID: 41464 (container-manag)
      Tasks: 10 (limit: 629145)
     Memory: 13.3M
     CGroup: /system.slice/container-manager.service
             └─41464 /home/container-manager/container-manager run -ctrStrategy ringRecover
...

View component logs.

cat /var/log/mindx-dl/container-manager/container-manager.log

Command output (Atlas 800I A3 SuperPoD Server as an example):

[INFO]     2025/11/25 22:46:59.007163 1       hwlog/api.go:108    container-manager.log's logger init success
[INFO]     2025/11/25 22:46:59.007288 1       command/run.go:150    init log success
[INFO]     2025/11/25 22:46:59.007506 1       devmanager/devmanager.go:134    get card list from dcmi reset timeout is 60
[INFO]     2025/11/25 22:46:59.250103 1       devmanager/devmanager.go:142    deviceManager get cardList is [0 1 2 3 4 5 6 7], cardList length equal to cardNum: 8
[INFO]     2025/11/25 22:46:59.250267 1       devmanager/devmanager.go:171    the dcmi version is 25.5.0.b030
[INFO]     2025/11/25 22:46:59.250405 1       devmanager/devmanager.go:235    chipName: Ascend910, devType: Ascend910A3
...

If the following information is displayed, the component is running properly:

...
[INFO]     2025/11/25 22:46:59.289352 1       devmgr/workflow.go:57    init module <hwDev manager> success
[INFO]     2025/11/25 22:46:59.293773 1       app/config.go:40    load fault config from /home/faultCode.json success
[INFO]     2025/11/25 22:46:59.293866 1       app/workflow.go:50    init module <fault manager> success
[INFO]     2025/11/25 22:46:59.293901 1       app/workflow.go:76    init module <container controller> success
[INFO]     2025/11/25 22:46:59.293930 1       app/workflow.go:64    init module <reset-manager> success
[INFO]     2025/11/25 22:46:59.315101 378     devmgr/hwdevmgr.go:365    subscribe device fault event success
...

Parent topic: Upgrade Process