tft_set_optimizer_replica

Function

Sets the replica relationship of the optimizer state corresponding to a rank.

Format

mindio_ttp.framework_ttp.tft_set_optimizer_replica(rank: int, replica_info: list)

Parameters

Parameter

Mandatory/Optional

Description

Value

rank

Mandatory

Rank ID of the NPU on which a training job is being executed.

int, [0, 100000).

replica_info

Mandatory

List of replica relationships. Each element is a dictionary. The dictionary is arranged in the sequence of ATTENTION (0) and MOE (1).

[
{
"rank_list": list, rank list of a group of replica relationships. In the PyTorch scenario, it indicates the DP group rank list. In the MindSpore scenario, it indicates the list of all replica NPUs corresponding to one NPU.
"replica_cnt": int, number of replicas. In the PyTorch scenario, it indicates the number of replicas. In the MindSpore scenario, it is the length of rank_list.
"replica_shift": int, valid in the PyTorch scenario.
},
]

Return Value

No return value. If an error occurs, an error log is recorded and an exception is thrown.