Printing Memory and Variables

Based on the variable type and usage, a variable can be stored in a register or in the local memory or global memory. You can print the address of a variable to find its storage location and further print the associated memory.

Printing Variables

After a breakpoint is hit, you can run the p variable_name command to print the value of the specified variable. For example:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
(msdebug) p alpha
(float) $0 = 0.00100000005
(msdebug) p tiling
(const TCubeTiling) $1 = {
  usedCoreNum = 2
  M = 1024
  N = 640
  Ka = 256
  ...
}

Currently, the msDebug tool cannot directly print the value of a template parameter by variable name. You need to print the value of the template parameter using the p template parameter object. The value of the template parameter is displayed after printing. For example, COMPUTE_LENGTH is a template parameter, and this is the object pointer to which the template parameter belongs. If you want to print the value of the parameter, run the p this command where the parameter is used. An example is provided as follows:

1
2
3
4
5
6
7
8
9
   22   template<class ArchTag_, class ElementAccumulator_, class ElementOut_, uint32_t COMPUTE_LENGTH>
   23   struct ReduceAdd {
   24       ReduceAdd(Arch::Resource<ArchTag> &resource)
   25       {
 -> 26            for (uint32_t i = 0; i < BUFFER_NUM; i++) {
   27               inputBuffer[i] = resource.ubBuf.template GetBufferByByte<ElementAccumulator>(bufferOffset);
   28               bufferOffset += COMPUTE_LENGTH * sizeof(ElementAccumulator);
(msdebug) p this
(Catlass::Gemm::Kernel::ReduceAdd<Catlass::Arch::AtlasA2, float, __fp16, 32> *) $0 = 0x00000000001cf838

Printing GlobalTensor

GlobalTensor is used to store the global data of the global memory (external storage).

You can run the following commands to print GlobalTensor. The following takes cGlobal as an example. The address_ field specifies the memory address of zGm. In this example, the value is 0x000012c045400000.

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
(msdebug) p cGlobal
(AscendC::GlobalTensor<float>) $0 = {
  AscendC::BaseGlobalTensor<float> = {
    address_ = 0x000012c045400000
    oriAddress_ = 0x000012c045400000
  }
  bufferSize_ = 655360
  shapeInfo_ = {
    shapeDim = '\0'
    originalShapeDim = '\0'
    shape = ([0] = 0, [1] = 0, [2] = 0, [3] = 0, [4] = 0, [5] = 0, [6] = 0, [7] = 0)
    originalShape = ([0] = 0, [1] = 0, [2] = 0, [3] = 0, [4] = 0, [5] = 0, [6] = 0, [7] = 0)
    dataFormat = ND
  }
  cacheMode_ = CACHE_MODE_NORMAL
}

The actual values of GlobalTensor variables are stored in the GM. Run the following command to print the values at 0x000012c045400000 in the GM. The example printing format contains the following information: one line to be printed, 256 bytes in each line, in float32 format.

1
2
(msdebug) x -m GM -f float32[] 0x000012c045400000 -s 256 -c 1
0x12c045400000: {4096 4096 4096 4096 4096 4096 4096 4096 4096 4096 4096 4096 4096 4096 4096 4096 4096 4096 4096 4096 4096 4096 4096 4096 4096 4096 4096 4096 4096 4096 4096 4096 4096 4096 4096 4096 4096 4096 4096 4096 4096 4096 4096 4096 4096 4096 4096 4096 4096 4096 4096 4096 4096 4096 4096 4096 4096 4096 4096 4096 4096 4096 4096 4096}
  • If you want to print other custom addresses, ensure the validity of the custom addresses. Otherwise, errors may occur during operator running.
  • If you want to print the memory starting from a custom address, you can add an offset based on the address_ field as the start address. The unit of the offset is byte. After the offset GM memory address is obtained, enter it into the memory printing command.

Printing LocalTensor

LocalTensor is used to store the data in the local memory (internal storage) of the AI Core.

Run the following command to print the LocalTensor variable. reluOutLocal is used as an example. For details about the memory address of reluOutLocal, see the bufferAddr parameter in address_ field. In this example, the bufferAddr parameter is 0 and the data length is 131072.

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
(msdebug) p reluOutLocal
(AscendC::LocalTensor<float>) $2 = {
  AscendC::BaseLocalTensor<float> = {
    address_ = (dataLen = 131072, bufferAddr = 0, bufferHandle = "", logicPos = '\n')
  }
  shapeInfo_ = {
    shapeDim = '\0'
    originalShapeDim = '\0'
    shape = ([0] = 0, [1] = 1092616192, [2] = 4800, [3] = 1473680, [4] = 0, [5] = 1473888, [6] = 0, [7] = 1471968)
    originalShape = ([0] = 0, [1] = 3222199212, [2] = 4800, [3] = 1, [4] = 0, [5] = 1473376, [6] = 0, [7] = 1473376)
    dataFormat = ND
  }
}

The actual content of the tensor is stored in the UB memory. You can run the following command to print the value at address 0 in the UB memory. The example printing format contains the following information: one line to be printed, 256 bytes in each line, in float32 format.

  • In this sample, the actual content of the tensor variables is stored in the UB. However, the local tensor may be stored in the UB, L1, L0A, or L0B. You need to determine store location based on the code, and select the correct memory type for the -m option of the printing command.
  • If you want to print the memory starting from a custom address, you can add an offset based on the address_ field as the start address. The unit of the offset is byte. After the offset GM memory address is obtained, enter it into the memory printing command.
1
2
(msdebug) x -m UB -f float32[] 0 -s 256 -c 1
0x00000000: {4096 4096 4096 4096 4096 4096 4096 4096 4096 4096 4096 4096 4096 4096 4096 4096 4096 4096 4096 4096 4096 4096 4096 4096 4096 4096 4096 4096 4096 4096 4096 4096 4096 4096 4096 4096 4096 4096 4096 4096 4096 4096 4096 4096 4096 4096 4096 4096 4096 4096 4096 4096 4096 4096 4096 4096 4096 4096 4096 4096 4096 4096 4096 4096}

Printing All Local Variables

Print all local variables in the current scope:
1
2
3
4
5
6
7
(msdebug) var
(MatmulLeakyKernel<__fp16, __fp16, float, float> *__stack__) this = 0x0000000000167b60
(uint32_t) count = 0
(const uint32_t) roundM = 2
(const uint32_t) roundN = 5
(uint32_t) startOffset = 0
(AscendC::DataCopyParams) copyParam = (blockCount = 256, blockLen = 16, srcStride = 0, dstStride = 64)