Ascend Operator

Application Scenario

MindCluster Ascend Operator supports the IP address of the main process needed for collective communication, the RankTable information required for collective communication in static networking, and the rank ID of a pod.

Component Function

  • Create a pod and inject collective communication parameters as environment variables.
  • Create a RankTable file and mount it to the container in shared storage or ConfigMap mode to optimize the link setup performance for collective communication.

Upstream and Downstream Dependencies

Figure 1 Upstream and downstream dependencies
  1. Check whether the resources required by the current task are sufficient based on Volcano.
  2. Create a pod for a task and inject environment variables for collective communication after detecting that resources are sufficient.
  3. Select resources via Volcano after the pod is created.
  4. Obtain the processor ID, IP address, and rank ID of a task from Ascend Device Plugin, summarize the information, and generate a collective communication file.
  5. Mount the collective communication file to the container through shared storage or ConfigMap.