Upgrading Container Manager
Directly replace the Container Manager binary file to upgrade the component on a physical machine.
- Log in to the node where Container Manager is deployed as the root user.
- Upload the obtained Container Manager package to any directory on the server (for example, /tmp/container-manager).
- Go to the /tmp/container-manager directory and decompress the package.
unzip Ascend-mindxdl-container-manager_{version}_linux-{arch}.zip
<version> indicates the package version, and <arch> indicates the CPU architecture.
- Run the following commands in sequence to upgrade Container Manager:
# Stop the Container Manager service and delete the corresponding Container Manager binary file. systemctl stop container-manager.service chattr -i /usr/local/bin/container-manager rm -f /usr/local/bin/container-manager # Retrieve the new binary file from the decompressed package and replace the existing Container Manager binary file. cp /tmp/container-manager/container-manager /usr/local/bin chmod 500 /usr/local/bin/container-manager # Restart the Container Manager service. systemctl daemon-reload systemctl start container-manager.service
- Verify the upgrade status of Container Manager.
- Check the component service status, which should be active (running).
systemctl status container-manager.service
Command output:
1 2 3 4 5 6 7 8 9 10
● container-manager.service - Ascend container manager Loaded: loaded (/etc/systemd/system/container-manager.service; disabled; vendor preset: enabled) Active: active (running) since Wed 2025-11-26 20:56:50 UTC; 16s ago Process: 41459 ExecStart=/bin/bash -c container-manager run -ctrStrategy ringRecover -logPath=/var/log/mindx-dl/container-manager/container-manager.log >/dev/null 2>&1 & (code=exited, status=0/SUCCESS) Main PID: 41464 (container-manag) Tasks: 10 (limit: 629145) Memory: 13.3M CGroup: /system.slice/container-manager.service └─41464 /home/container-manager/container-manager run -ctrStrategy ringRecover ...
- View component logs.
cat /var/log/mindx-dl/container-manager/container-manager.log
Command output (Atlas 800I A3 SuperPoD Server as an example):
1 2 3 4 5 6 7
[INFO] 2025/11/25 22:46:59.007163 1 hwlog/api.go:108 container-manager.log's logger init success [INFO] 2025/11/25 22:46:59.007288 1 command/run.go:150 init log success [INFO] 2025/11/25 22:46:59.007506 1 devmanager/devmanager.go:134 get card list from dcmi reset timeout is 60 [INFO] 2025/11/25 22:46:59.250103 1 devmanager/devmanager.go:142 deviceManager get cardList is [0 1 2 3 4 5 6 7], cardList length equal to cardNum: 8 [INFO] 2025/11/25 22:46:59.250267 1 devmanager/devmanager.go:171 the dcmi version is 25.5.0.b030 [INFO] 2025/11/25 22:46:59.250405 1 devmanager/devmanager.go:235 chipName: Ascend910, devType: Ascend910A3 ...
If the following information is displayed, the component is running properly:
... [INFO] 2025/11/25 22:46:59.289352 1 devmgr/workflow.go:57 init module <hwDev manager> success [INFO] 2025/11/25 22:46:59.293773 1 app/config.go:40 load fault config from /home/faultCode.json success [INFO] 2025/11/25 22:46:59.293866 1 app/workflow.go:50 init module <fault manager> success [INFO] 2025/11/25 22:46:59.293901 1 app/workflow.go:76 init module <container controller> success [INFO] 2025/11/25 22:46:59.293930 1 app/workflow.go:64 init module <reset-manager> success [INFO] 2025/11/25 22:46:59.315101 378 devmgr/hwdevmgr.go:365 subscribe device fault event success ...
- Check the component service status, which should be active (running).
Parent topic: Upgrade Process