mindx_elastic.api.patch_torch_methods(内部接口,严禁调用)

功能说明

在构建训练镜像安装Elastic Agent时,使用sed -i '/import logging/i import mindx_elastic.api \' $(pip3.7 show torch | grep Location | awk -F ' ' '{print $2}')/torch/distributed/run.py命令后,Mindx_elastic组件会在导入torch的elastic模块时执行mindx_elastic.api.patch_torch_method接口,使patch自动生效。

elastic组件会对torch.distributed.elastic.agent.server.api.SimpleElasticAgent._invoke_run、torch.distributed.launcher.api.launch_agent、torch.distributed.elastic.agent.server.api.SimpleElasticAgent._initialize_workers方法打patch,额外提供昇腾NPU设备的故障检测与恢复功能。