Contention Check

Contention check is used to resolve memory access contention in a parallel computing environment. In the Ascend processor architecture, the external storage and internal storage are usually used as temporary buffers to store data being processed. The external storage or internal storage can be accessed by multiple pipelines at the same time, and the external storage can be accessed by multiple cores. If the operator program does not correctly process inter-core, inter-pipeline, or intra-pipeline synchronization, data contention may occur.

The contention detection function of the mssanitizer tool cannot identify the scenario where inter-core synchronization functions as pipeline synchronization.

Memory Contention Type

Memory contention occurs when two memory events, at least one of which is a write event, attempt to access the same memory block, with results that do not conform to the expected order of execution. This exception causes a data contention, which causes the program's execution or output to depend on the order in which memory events are actually executed. The contention check function can identify the following typical memory contentions:

Table 1 Memory contention types

Exception

Description

Location

Address Space

Write-After-Write(WAW)

This exception may occur when two memory events attempt to write to the same memory block. As a result, the memory result value depends on the actual access sequence of the two memory events.

kernel

GM, UB, L0{A,B,C}, L1

Write-After-Read(WAR)

This exception may occur when two memory events (one memory read event and one memory write event) attempt to access the same memory block. The write operation event is actually executed before the read operation event, and the read memory value is not the expected start value.

Read-After-Write(RAW)

This exception may occur when two memory events (one memory read event and one memory write event) attempt to access the same memory block. The read operation event is actually executed before the write operation event, and the read memory value is not updated.

When the contention check function identifies an exception, you can modify the program to ensure that the exception does not exist. In the case of "write before read" or "read before write", the sequence is determined based on the value of serialNo. The task with a smaller serialNo is executed first on PIPE_S.

Enabling Contention Check

When running the msSanitizer tool, run the following command to enable the contention check function (racecheck):
mssanitizer --tool=racecheck application // application indicates the user program.
  • Contention check does not check for memory errors. You are advised to perform Memory Check first to ensure that the operator program can be executed properly.
  • After the user program is complete, an exception report is displayed on the GUI. For details about the exceptions, see Analyzing Contention Check Report.
  • After the tool is started, the tool run log file mssanitizer_{TIMESTAMP}_{PID}.log is automatically generated in the current directory.

Analyzing Contention Check Report

Contention check outputs information to describe the memory contention access risks between pipes of the operator.

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
====== ERROR: Potential RAW hazard detected at UB :  // Contention event type and abnormal memory block information
======    PIPE_MTE2 Write at RAW()+0x0 in block 0 (aiv) at pc current 0xa98 (serialNo:14)  // Detailed information about the contention event, including the pipe where the event is located, operation type, memory access start address, core type, AI Core information, PC pointer executed by the code, and sequence number of the API call behavior.
====== #0 ${ASCEND_HOME_PATH}/compiler/tikcpp/tikcfw/impl/dav_c220/kernel_operator_data_copy_impl.h:58:9  // The following is the code call stack where the exception occurs, including the file name, line number, and column number.
======    #1 ${ASCEND_HOME_PATH}/compiler/tikcpp/tikcfw/inner_interface/inner_kernel_operator_data_copy_intf.cppm:58:9
======    #2 ${ASCEND_HOME_PATH}/compiler/tikcpp/tikcfw/inner_interface/inner_kernel_operator_data_copy_intf.cppm:443:5
======    #3 Racecheck/add_custom.cpp:17:5
======    PIPE_MTE3 Read at RAW()+0x0 in block 0 (aiv) at pc current 0xad4 (serialNo:17)
======    #0 ${ASCEND_HOME_PATH}/compiler/tikcpp/tikcfw/impl/dav_c220/kernel_operator_data_copy_impl.h:103:9
======    #1 ${ASCEND_HOME_PATH}/compiler/tikcpp/tikcfw/inner_interface/inner_kernel_operator_data_copy_intf.cppm:155:9
======    #2 ${ASCEND_HOME_PATH}/compiler/tikcpp/tikcfw/inner_interface/inner_kernel_operator_data_copy_intf.cppm:461:5
======    #3 Racecheck/add_custom.cpp:22:5

The preceding example indicates that there is a write-before-read contention risk in the Vector Core of AI Core 0. The PIPE_MTE2 pipeline writes to the 0x0 address, and this operation corresponds to line 17 in the operator implementation file add_custom.cpp. The PIPE_MTE3 pipeline reads the 0x0 address, and this operation corresponds to line 22 in the operator implementation file add_custom.cpp.