Overflow/Underflow Operator Data Collection and Analysis
Prerequisites
To use the ATC tool to convert a model, ensure that the --status_check parameter has been added to the conversion command and set to 1, indicating that the overflow/underflow detection logic is added during operator compilation.
For details about the ATC tool and its parameters, see ATC Instructions.
Collecting Overflow/Underflow Operator Information
Add the dump configuration of the overflow/underflow operator to the JSON configuration file when the acl.init API is called to initialize pyACL.
{
"dump":{
"dump_path":"output",
"dump_debug":"on"
}
}
If dump_path is set to a relative path, you can view the exported data files in {application_executable_files}/{dump_path} directory. For each overflow/underflow operator, two data files are exported:
- The dump file of an overflow/underflow operator is named as: {op_type}.{op_name}.{taskid}.{stream_id}.{timestamp}. Any period (.), slash (/), backslash (\), or space in the op_type or op_name field is replaced by an underscore (_).
You can identify an overflow/underflow operator based on the preceding information. To view the operator input and output, refer to Analyzing the Dump File of an Overflow/Underflow Operator.
- The data file of an overflow/underflow operator is named as: OpDebug.Node_Opdebug.{taskid}.{stream_id}.{timestamp}, where taskid is not the task ID of an overflow/underflow operator and can be ignored.
You can obtain the overflow information by referring to Analyzing the Data File of an Overflow/Underflow Operator, including the model where an overflow/underflow operator is located and the status register of AI Core.
Analyzing the Dump File of an Overflow/Underflow Operator
- Upload the {op_type}.{op_name}.{taskid}.{stream_id}.{timestamp} file to the environment with Toolkit installed.
- Go to the directory where the parsing script is located. For example, Toolkit is stored in /home/HwHiAiUser/Ascend/ascend-toolkit/latest.
cd /home/HwHiAiUser/Ascend/ascend-toolkit/latest/toolkit/tools/operator_cmp/compare - Run the msaccucmp.py script to convert the dump file into the NumPy format. For example:
python3 msaccucmp.py convert -d /home/HwHiAiUser/dump -out /home/HwHiAiUser/dumptonumpy -v 2
The -d option enables the conversion of a single dump file or all dump files in a path.
- Use Python to save the NumPy data into a .txt file. For example:
1 2 3 4 5
$ python3 >>> import numpy as np >>> a = np.load("/home/HwHiAiUser/dumptonumpy/Pooling.pool1.1147.1589195081588018.output.0.npy") >>> b = a.flatten() >>> np.savetxt("/home/HwHiAiUser/dumptonumpy/Pooling.pool1.1147.1589195081588018.output.0.txt", b)
The shape and Dtype no longer exist in the .txt file. For more details, visit the NumPy website.
Analyzing the Data File of an Overflow/Underflow Operator
Since the generated overflow/underflow data is in binary format, you must interpret the binary file into a readable format, such as JSON.
- Upload the overflow/underflow data file OpDebug.Node_Opdebug.{taskid}.{timestamp} to the Toolkit installation environment.
- Go to the path where the parsing script is located. For example, Toolkit is stored in /home/HwHiAiUser/Ascend/ascend-toolkit/latest.
cd /home/HwHiAiUser/Ascend/ascend-toolkit/latest/toolkit/tools/operator_cmp/compare - Run the parse command.
1python3 msaccucmp.py convert -d /home/HwHiAiUser/opdebug/Opdebug.Node_OpDebug.59.1597922031178434 -out /home/HwHiAiUser/result
The key options are described as follows:
- -d: directory of the overflow/underflow data file, including the file name
- -out: directory of the parsing result. If it is not specified, the current directory is used.
- Find the parsing result as follows:
{ "DHA Atomic Add": { "model_id": 0, "stream_id": 0, "task_id": 0, "task_type": 0, "pc_start": "0x0", "para_base": "0x0", "status": 0 }, "L2 Atomic Add": { "model_id": 0, "stream_id": 0, "task_id": 0, "task_type": 0, "pc_start": "0x0", "para_base": "0x0", "status": 0 }, "AI Core": { "model_id": 514, "stream_id": 563, "task_id": 57, "task_type": 0, "pc_start": "0x1008005b0000", "para_base": "0x100800297000", "kernel_code": "0x1008005ae000", "block_idx": 1, "status": 32 } }The fields are described as follows:
- model_id: ID of the model where an overflow/underflow operator is located
- stream_id: ID of the stream where an overflow/underflow operator is located
- task_id: task ID of an overflow/underflow operator
- task_type: task type of an overflow/underflow operator
- pc_start: memory start address of an overflow/underflow operator code program
- para_base: memory start address of an overflow/underflow operator parameter
- kernel_code: memory start address of an overflow/underflow operator code program, the same as pc_start
- block_idx: block ID of an overflow/underflow operator
- status: AI Core status register. You can obtain the specific overflow/underflow error by analyzing this field. The value of status is a decimal number, so you must convert it to a hexadecimal number for locating the fault.
For example, assume that the value of status is 272. The hexadecimal equivalent of the value is 0x00000110. Therefore, the error cause is 0x00000010+0x00000100.
- 0x00000008: inversion overflow/underflow of the minimum negative sign bit of a signed integer
- 0x00000010: integer addition, subtraction, multiplication, or multiplication overflow/underflow
- 0x00000020: floating-point overflow/underflow
- 0x00000080: negative input for the conversion of floating-point to unsigned data
- 0x00000100: float32 to float16 conversion or 32-bit signed integer to float16 conversion overflow/underflow
- 0x00000400: cube accumulation overflow/underflow