TilingData of v1 (Deprecated)

This structure has been deprecated and will be removed in later versions. Do not use this structure. You do not need to directly set the members in this structure. Instead, use the API settings provided by HCCL Tiling.

For the TilingData structure of an MC2 operator, the tiling structure of computation must be after that of communication.

TilingData of v1 and v2 can be distinguished using the first uint32_t field of the tiling structure, that is, the preparePosition field of v1 and the version field of v2. If the tiling structure of v2 is used, set version to 2. If the tiling structure of v1 is used, set preparePosition to 1. No matter which TilingData is used, you must strictly follow the tiling structure of the corresponding version and use it as a part of the TilingData structure of the operator.

Function

Obtains the fixed communication configuration Mc2Msg before the AI CPU starts to deliver a communication task. In operator implementation, the tiling method is used to assemble communication configuration items. After the fixed parameters are configured in the fixed sequence for tiling data, communication configuration is passed to the AI CPU when the AI CPU communication API is called.

Parameters

Table 1 Mc2Msg parameters

Parameter

Description

preparePosition

Mode of task assembling on the server. You need to explicitly assign a value of the uint32_t type in tiling. The following value is supported:

1: The AI CPU and AI Core use the communication task mechanism for message transfer and task delivery. This parameter is set to 1 when the AI Core uses the mode of message notification, that is, when HCCL is used in the operator.

sendOff

Reserved parameter, which cannot be configured.

recvOff

Reserved parameter, which cannot be configured.

tailSendOff

Reserved parameter, which cannot be configured.

tailRecvOff

Reserved parameter, which cannot be configured.

sendCnt

Reserved parameter, which cannot be configured.

recvCnt

Reserved parameter, which cannot be configured.

tailSendCnt

Reserved parameter, which cannot be configured.

tailRecvCnt

Reserved parameter, which cannot be configured.

totalCnt

Reserved parameter, which cannot be configured.

turnNum

Reserved parameter, which cannot be configured.

tailNum

Reserved parameter, which cannot be configured.

stride

Reserved parameter, which cannot be configured.

workspaceOff

Reserved parameter, which cannot be configured.

notifyOff

Reserved parameter, which cannot be configured.

notifyBeginCnt

Reserved parameter, which cannot be configured.

notifyEndCnt

Reserved parameter, which cannot be configured.

useBufferType

Location where the input data of communication algorithm is obtained. The value is of the uint8_t type. The options are as follows:

  • 0 (default): The communication input is not stored in Windows by default. Windows is the shared buffer that can be accessed by other cards.
  • 1: The communication input is not stored in Windows. The function is the same whether this parameter is set to 1 or 0.
  • 2: The communication input is stored in Windows, which applies only to the AllReduce algorithm.

funID

Reserved parameter, which cannot be configured.

dataType

Reserved parameter, which cannot be configured.

groupNum

Reserved parameter, which cannot be configured.

reuseMode

Reserved parameter, which cannot be configured.

commType

Reserved parameter, which cannot be configured.

reduceOp

Reserved parameter, which cannot be configured.

commOrder

Reserved parameter, which cannot be configured.

waitPolicy

Reserved parameter, which cannot be configured.

rspPolicy

Reserved parameter, which cannot be configured.

exitPolicy

Reserved parameter, which cannot be configured.

commAlg

Communication algorithm setting. You need to explicitly assign a value in Tiling. The value is of the uint8_t type. The following value is supported:

1: Full-mesh algorithm. Full-mesh connections are established between NPUs, that is, data can be directly transmitted between any two NPUs. For details about the algorithm of "Collective Communication Algorithm Introduction" in .

taskType

Reserved parameter, which cannot be configured.

debugMode

Reserved parameter, which cannot be configured.

stepSize

Reserved parameter, which cannot be configured.

sendArgIndex

Reserved parameter, which cannot be configured.

recvArgIndex

Reserved parameter, which cannot be configured.

commOutArgIndex

Reserved parameter, which cannot be configured.

hasCommOut

Whether the computing result of the current device communication algorithm is to be output to the recvBuf (address of the destination data buffer). This parameter is configured only for the AllGather and AlltoAll algorithms. The value is of the uint8_t type. The options are as follows:

  • 0: The computing result of the current device communication algorithm is not output. In this case, the communication result data of the current device is not copied, and the operator performance is improved. For example, when eight cards are used, the current device obtains only part of data of other cards. In this case, this parameter can be set to 0.
  • 1: The computing result of the current device communication algorithm is output.

reserve

Reserved parameter

reserve2

Reserved parameter

Restrictions

  • The Tiling Data struct of the operator must contain all Mc2Msg parameters in sequence.
  • The AI CPU needs to obtain the communication configuration of the fixed data structure to ensure the consistent structure when Tiling Data is registered.
  • Atlas A3 training products / Atlas A3 inference products does not support TilingData of this version currently.

Example

The following uses the custom operator AllGatherMatmulCustom as an example. The operator prototype is as follows. gather_out indicates the output of the AllGather communication task.

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
[
    {
        "op": "AllGatherMatmulCustom",
        "input_desc": [
            {
                "name": "x1",
                "param_type": "required",
                "format": [
                    "ND",
		    "ND"
                ],
                "type": [
                    "float16",
                    "bfloat16"
                ]
            },
            {
                "name": "x2",
                "param_type": "required",
                "format": [
                    "ND",
		    "ND"
                ],
                "type": [
                    "float16",
                    "bfloat16"
                ]
            },
            {
                "name": "bias",
                "param_type": "optional",
                "format": [
                    "ND",
		    "ND"
                ],
                "type": [
                    "float16",
                    "bfloat16"
                ]
            }
        ],
        "output_desc":[
            {
                "name": "y",
                "param_type": "required",
                "format": [
                    "ND",
		    "ND"
                ],
                "type": [
                    "float16",
                    "bfloat16"
                ]
            },
            {
                "name": "gather_out",
                "param_type": "required",
                "format": [
                    "ND",
		    "ND"
                ],
                "type": [
                    "float16",
                    "bfloat16"
                ]
            }
        ],
        "attr": [
            {
                "name": "group",
                "type": "string",
                "default_value":"",
                "param_type":"required"
            },
            {
                "name": "rank_size",
                "type": "int",
                "default_value":0,
                "param_type":"optional"
            },
            {
                "name": "is_gather_out",
                "type": "bool",
                "default_value":true,
                "param_type":"optional"
            }
        ]
    }
]

The Tiling Data struct of the operator must contain all Mc2Msg parameters in sequence. The following is an example of the Tiling Data code:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
// Declare the Mc2Msg struct.
BEGIN_TILING_DATA_DEF(Mc2Msg)
    TILING_DATA_FIELD_DEF(uint32_t, preparePosition);
    TILING_DATA_FIELD_DEF(uint32_t, sendOff); 
    TILING_DATA_FIELD_DEF(uint32_t, recvOff);
    TILING_DATA_FIELD_DEF(uint32_t, tailSendOff);
    TILING_DATA_FIELD_DEF(uint32_t, tailRecvOff);
    TILING_DATA_FIELD_DEF(uint64_t, sendCnt);
    TILING_DATA_FIELD_DEF(uint32_t, recvCnt);
    TILING_DATA_FIELD_DEF(uint32_t, tailSendCnt);
    TILING_DATA_FIELD_DEF(uint32_t, tailRecvCnt);
    TILING_DATA_FIELD_DEF(uint32_t, totalCnt);
    TILING_DATA_FIELD_DEF(uint32_t, turnNum);
    TILING_DATA_FIELD_DEF(uint32_t, tailNum);
    TILING_DATA_FIELD_DEF(uint32_t, stride);
    TILING_DATA_FIELD_DEF(uint32_t, workspaceOff);
    TILING_DATA_FIELD_DEF(uint32_t, notifyOff);
    TILING_DATA_FIELD_DEF(uint16_t, notifyBeginCnt);
    TILING_DATA_FIELD_DEF(uint16_t, notifyEndCnt);
    TILING_DATA_FIELD_DEF(uint8_t, useBufferType);
    TILING_DATA_FIELD_DEF(uint8_t, funID);
    TILING_DATA_FIELD_DEF(uint8_t, dataType);
    TILING_DATA_FIELD_DEF(uint8_t, groupNum);
    TILING_DATA_FIELD_DEF(uint8_t, reuseMode);
    TILING_DATA_FIELD_DEF(uint8_t, commType);
    TILING_DATA_FIELD_DEF(uint8_t, reduceOp);
    TILING_DATA_FIELD_DEF(uint8_t, commOrder);
    TILING_DATA_FIELD_DEF(uint8_t, waitPolicy);
    TILING_DATA_FIELD_DEF(uint8_t, rspPolicy);
    TILING_DATA_FIELD_DEF(uint8_t, exitPolicy);
    TILING_DATA_FIELD_DEF(uint8_t, commAlg);
    TILING_DATA_FIELD_DEF(uint8_t, taskType);
    TILING_DATA_FIELD_DEF(uint8_t, debugMode);
    TILING_DATA_FIELD_DEF(uint8_t, stepSize);
    TILING_DATA_FIELD_DEF(uint8_t, sendArgIndex);
    TILING_DATA_FIELD_DEF(uint8_t, recvArgIndex);
    TILING_DATA_FIELD_DEF(uint8_t, commOutArgIndex);
    TILING_DATA_FIELD_DEF(uint8_t, hasCommOut);
    TILING_DATA_FIELD_DEF(uint8_t, reserve);
    TILING_DATA_FIELD_DEF(uint32_t, reserve2);
END_TILING_DATA_DEF;
REGISTER_TILING_DATA_CLASS(Mc2MsgOp, Mc2Msg)

BEGIN_TILING_DATA_DEF(AllGatherMatmulCustomTilingData)
    TILING_DATA_FIELD_DEF_STRUCT(Mc2Msg, msg);
END_TILING_DATA_DEF;
1
2
3
4
5
6
// Configure Mc2Msg.
AllGatherMatmulCustomTilingData tiling;
tiling.msg.set_preparePosition(1);
tiling.msg.set_commAlg(1);
tiling.msg.set_useBufferType(1);
tiling.msg.set_hasCommOut(1);