StressTest

Description

Receives online stress testing requests from the O&M platform and forwards the corresponding operation to the specified node of a training job. It must be called only after a training job has successfully completed its execution and iteration, to ensure that the job has been registered with ClusterD. This API represents a manual O&M operation. Before calling this API, ensure that the server environment is normal.

Deliver the online stress testing command after the training iteration is normal.

Prototype

rpc StressTest(StressTestParam) returns (Status) {}

Input Parameters

Parameter

Type (Defined by Protobuf)

Description

StressTest

message StressTestParam {

string jobID = 1;

map<string, StressOpList> stressParam = 2;

repeated int64 allNodesOps = 3;

}

message StressOpList {

repeated int64 ops = 1;

}

StressTestParam.jobID: job ID

StressTestParam.stressParam: node and operation receiving the user-issued stress testing instruction. key indicates the node name, and value indicates the stress testing operation to be performed on the node.

StressTestParam.allNodesOps: stress testing operation to be performed on all nodes. The priority of allNodesOps is higher than that of stressParam. 0 indicates AIC stress testing, and 1 indicates P2P stress testing.

StressOpList.ops: stress testing operation to be performed on the node. 0 indicates AIC stress testing, and 1 indicates P2P stress testing.

Return Value

Parameter

Type (Defined by Protobuf)

Description

Status

message Status{

int32 code = 1;

string info = 2;

}

Status.code: return code

  • 0: instruction delivered successfully
  • Other values: failed to deliver the instruction

Status.info: return information