CheckLocalMemoryIA(ISASI)
Supported Products
Product |
Supported (√/x) |
|---|---|
√ |
|
√ |
|
√ |
|
√ |
|
x |
|
x |
Function Usage
Checks the UB read/write behavior within the specified range. If the read/write behavior is within the specified range, an EXCEPTION error is reported. If the read/write behavior is not within the specified range, no error is reported.
Prototype
1 | __aicore__ inline void CheckLocalMemoryIA(const CheckLocalMemoryIAParam& checkParams) |
Parameters
Parameter |
Input/Output |
Meaning |
|---|---|---|
checkParams |
Input |
Checks the UB access behavior. It is of the CheckLocalMemoryIAParam type. For details, see ${INSTALL_DIR}/include/ascendc/basic_api/interface/kernel_struct_mm.h. Replace ${INSTALL_DIR} with the actual CANN component directory. For details about the parameter description, see Table 2. |
Parameter |
Meaning |
|---|---|
enableBit |
Configures the exception register. enableBit ∈ [0,3]. The default value is 0.
|
startAddr |
Start address of CheckLocalMemoryIAParam, 32-byte aligned. Value range: startAddr ∈ [0, 65535]. The default value is 0. For example, startAddr may be obtained by using LocalTensor.GetPhyAddr()/32. |
endAddr |
End address of the check, which is 32-byte aligned. The value range is [0, 65535]. The default value is 0. |
isScalarRead |
Checks scalar read access.
|
isScalarWrite |
Checks scalar write access.
|
isVectorRead |
Checks vector read access.
|
isVectorWrite |
Checks vector write access.
|
isMteRead |
Checks Mte read access.
|
isMteWrite |
Checks Mte write access.
|
isEnable |
Enables or disables the exception register configured by the enableBit parameter.
|
reserved |
Reserved. This parameter is reserved for future functions. You can use the default value. |
Constraints
- The unit of startAddr/endAddr is 32 bytes. The check range does not contain startAddr but contains endAddr, that is, (startAddr, endAddr].
- This API needs to be reset each time it is called (by setting isEnable to false).
- For details about the operand address alignment requirements, see General Address Alignment Restrictions.
Example
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 | #include "kernel_operator.h" class KernelAdd { public: __aicore__ inline KernelAdd() {} __aicore__ inline void Init(__gm__ uint8_t* src0Gm, __gm__ uint8_t* src1Gm, __gm__ uint8_t* dstGm) { src0Global.SetGlobalBuffer((__gm__ half*)src0Gm); src1Global.SetGlobalBuffer((__gm__ half*)src1Gm); dstGlobal.SetGlobalBuffer((__gm__ half*)dstGm); pipe.InitBuffer(inQueueSrc0, 1, 512 * sizeof(half)); pipe.InitBuffer(inQueueSrc1, 1, 512 * sizeof(half)); pipe.InitBuffer(outQueueDst, 1, 512 * sizeof(half)); } __aicore__ inline void Process() { CopyIn(); Compute(); CopyOut(); } private: __aicore__ inline void CopyIn() { AscendC::LocalTensor<half> src0Local = inQueueSrc0.AllocTensor<half>(); AscendC::LocalTensor<half> src1Local = inQueueSrc1.AllocTensor<half>(); AscendC::DataCopy(src0Local, src0Global, 512); AscendC::DataCopy(src1Local, src1Global, 512); inQueueSrc0.EnQue(src0Local); inQueueSrc1.EnQue(src1Local); } __aicore__ inline void Compute() { AscendC::LocalTensor<half> src0Local = inQueueSrc0.DeQue<half>(); AscendC::LocalTensor<half> src1Local = inQueueSrc1.DeQue<half>(); AscendC::LocalTensor<half> dstLocal = outQueueDst.AllocTensor<half>(); AscendC::CheckLocalMemoryIA({ 0, (uint32_t)(dstLocal.GetPhyAddr() / 32), (uint32_t)((dstLocal.GetPhyAddr() + 512 * sizeof(half)) / 32), false, false, false, true, false, false, true }); AscendC::Add(dstLocal, src0Local, src1Local, 512); outQueueDst.EnQue<half>(dstLocal); inQueueSrc0.FreeTensor(src0Local); inQueueSrc1.FreeTensor(src1Local); } __aicore__ inline void CopyOut() { AscendC::LocalTensor<half> dstLocal = outQueueDst.DeQue<half>(); AscendC::DataCopy(dstGlobal, dstLocal, 512); outQueueDst.FreeTensor(dstLocal); } private: AscendC::TPipe pipe; AscendC::TQue<AscendC::TPosition::VECIN, 1> inQueueSrc0, inQueueSrc1; AscendC::TQue<AscendC::TPosition::VECOUT, 1> outQueueDst; AscendC::GlobalTensor<half> src0Global, src1Global, dstGlobal; }; extern "C" __global__ __aicore__ void add_simple_kernel(__gm__ uint8_t* src0Gm, __gm__ uint8_t* src1Gm, __gm__ uint8_t* dstGm) { KernelAdd op; op.Init(src0Gm, src1Gm, dstGm); op.Process(); } |