create_group
Description
Creates a user-defined group for collective communication.
If a user-defined group is created without calling this API, all devices involved in cluster training are created as a global hccl_world_group by default.
- hccl_world_group: default global group (created by HCCL automatically), including all ranks that participate in collective communication.
- Customized groups: a subset of process groups contained in hccl_world_group.
Prototype
def create_group(group, rank_num, rank_ids)
Parameters
Parameter |
Input/Output |
Description |
|---|---|---|
group |
Input |
A string containing a maximum of 128 bytes, including the end character. Group name, which is the identifier of a collective communication group. The group name cannot be the default global group name hccl_world_group. If the group name specified by the user is hccl_world_group, the group fails to be created. |
rank_num |
Input |
An int. Number of ranks in a group. The maximum value is 32768. |
rank_ids |
Input |
A list. List of world_rank_ids that form the group. Different types of boards have different restrictions. For the
Supplementary notes: It is recommended that rank_ids be sorted based on the physical connection sequence of devices, that is, devices that are physically close to each other are arranged together. For example, if device_ip is set in ascending order based on the physical connection sequence, you are advised to set rank_ids in ascending order. |
Returns
None
Constraints
- This API must be called after the initialization of collective communication is complete.
- The caller rank must be within the range defined by the group argument passed to this API call. Otherwise, the API call fails.
Applicability
Example
The following is only a code snippet and cannot be executed. For details about how to call the HCCL Python APIs to perform collective communication, see Sample Code.
1 2 | from npu_bridge.npu_init import * create_group("myGroup", 4, [0, 1, 2, 3]) |