O&M

Prerequisites

This feature applies only to the following scenario: You need to use the rescheduling function, and the autoStowing parameter has been set to false in the YAML startup file of Ascend Device Plugin.

Procedure

  • Run the following command to add the processors whose health status is restored from unhealthy to healthy to the resource pool:
    kubectl label nodes node_name huawei.com/Ascend910-Recover-

    After the command is executed, the huawei.com/Ascend910-Recover label is deleted. Processors with the label are placed in the resource pool for program scheduling.

    This command is used only to clear the Recover label information. Do not use it to add labels.

  • Run the following command to add the processors whose parameter plane network health status is restored from unhealthy to healthy to the resource pool:
    kubectl label nodes node_name huawei.com/Ascend910-NetworkRecover-

    After this command is executed, the huawei.com/Ascend910-NetworkRecover label is deleted, and the corresponding processors in huawei.com/Ascend910-NetworkUnhealthy are also deleted.

    This command is used only to clear the NetworkRecover label information. Do not use it to add labels.