Overview

HCCL Python APIs are used to implement framework adaptation in graph mode. Currently, they are only used to implement distributed optimization of the TensorFlow network on the NPU.

Concepts

Concept

Description

group

Indicates the process groups that participate in collective communication. The process groups include:

  • hccl_world_group: default global group, including all ranks that participate in collective communication. This group is created using the rank table file.
  • User-defined group: a subset of the process groups contained in the hccl_world_group. The ranks in the rank table file can be defined as different groups through the create_group API, and the collective communication algorithms can be executed in parallel.

rank

A communication entity in the group. Each rank is assigned a unique ID ranging from 0 to n – 1 (n is the number of NPUs).

rank size

  • Rank size: indicates the number of ranks in a group.
  • Local rank size: indicates the number of ranks in a group on the server where the processes are located.

rank id

  • Rank ID: indicates the ID of a process in a group. The value ranges from 0 to (rank size – 1). For a user-defined group, the rank starts from 0 in the group. For hccl_world_group, the rank ID is the same as the world rank ID.
  • World rank ID: indicates the rank ID of a process in hccl_world_group. The value ranges from 0 to (rank size – 1).
  • Local rank ID: indicates the rank ID of a process in a group on the server where the process is located. The value ranges from 0 to (local rank size – 1).