Data in ascend_pytorch_profiler_{Rank_ID}.db

This is a table schema file. You are advised to use MindStudio Insight to view the file or use a database development tool such as Navicat Premium to open the file. The profile data summarized by the current .db file is as follows:

STRING_IDS

Mapping between IDs and character strings.

There is no enable or disable option for this table. It records the mapping between strings and IDs used on CANN. Generally, the value starts accumulating from 0.

Table 1 Format

Field

Type

Index

Description

id

INTEGER

Primary key

ID corresponding to the string.

value

TEXT

-

String content

PYTORCH_API

API data on the framework side. Currently, only torch_npu API data is included.

It is controlled by torch_npu.profiler.ProfilerActivity.CPU of the Ascend PyTorch Profiler API.

Table 2 Format

Field

Type

Description

startNs

INTEGER

OP API start time, in ns.

endNs

INTEGER

OP API end time, in ns.

globalTid

INTEGER

Global TID of the API. High-order 32 bits: PID; low-order 32 bits: TID

connectionId

INTEGER

Used to query the connection ID in the CONNECTION_IDS table. If the connection ID does not exist, this field is left empty.

name

INTEGER

Name of the OP API, STRING_IDS(name).

sequenceNumber

INTEGER

OP No.

fwdThreadId

INTEGER

ID of the OP forward thread.

inputDtypes

INTEGER

Input data type, STRING_IDS(inputDtypes).

inputShapes

INTEGER

Input shape, STRING_IDS(inputShapes)

callchainId

INTEGER

This field is used to query the call stack information in the PYTORCH_CALLCHAINS table. If no stack information is available, this field is left empty.

type

INTEGER

Data type, which can be op, queue, mstx, or python_trace. The data type is stored in the enumeration table ENUM_API_TYPE.

CONNECTION_IDS

Association between framework APIs or between framework APIs and CANN APIs.

It is controlled by torch_npu.profiler.ProfilerActivity.CPU of the Ascend PyTorch Profiler API.

Table 3 Format

Field

Type

Description

id

INTEGER

Corresponds to the connectionId field in the PYTORCH_API table .

connectionId

INTEGER

Association ID. Currently, the association can be including task_queue, fwd_bwd, or torch-cann-task.

PYTORCH_CALLCHAINS

Stack information on the framework side.

It is controlled by the with_stack parameter of the Ascend PyTorch Profiler API.

Table 4 Format

Field

Type

Description

id

INTEGER

Corresponds to the callchainId field in the PYTORCH_API table .

stack

INTEGER

ID of the string content of the current stack in the STRING_IDS table

stackDepth

INTEGER

Depth of the current stack.

MEMORY_RECORD

Device memory usage records on the framework side.

It is controlled by the profile_memory parameter of the Ascend PyTorch Profiler API.

Table 5 Format

Field

Type

Description

component

INTEGER

ID of the component name (GE, PTA, or PTA+GE) in the STRING_IDS table

timestamp

INTEGER

Timestamp.

totalAllocated

INTEGER

Total allocated memory.

totalReserved

INTEGER

Total reserved memory.

totalActive

INTEGER

Total memory allocated to the PTA flow.

streamPtr

INTEGER

AscendCL stream address

deviceId

INTEGER

Device ID.

OP_MEMORY

Operator memory usage information integrated based on MEMORY_RECORD on the framework side.

It is controlled by the profile_memory parameter of the Ascend PyTorch Profiler API.

Table 6 Format

Field

Type

Description

name

INTEGER

torch and GE operator name, STRING_IDS(name).

size

INTEGER

Size of the memory occupied by the operator, in bytes.

allocationTime

INTEGER

Operator memory allocation time, in ns.

releaseTime

INTEGER

Operator memory release time, in ns.

activeReleaseTime

INTEGER

Actual time when the memory is returned to the memory pool, in ns.

duration

INTEGER

Memory occupation time, in ns.

activeDuration

INTEGER

Actual memory occupation time, in ns.

allocationTotalAllocated

INTEGER

Total memory allocated to PTA and GE during operator memory allocation, in bytes.

allocationTotalReserved

INTEGER

Total memory occupied by PTA and GE during operator memory allocation, in bytes.

allocationTotalActive

INTEGER

Total memory allocated for the current stream during operator memory allocation, in bytes.

releaseTotalAllocated

INTEGER

Total memory allocated to PTA and GE during operator memory release, in bytes.

releaseTotalReserved

INTEGER

Total memory occupied by PTA and GE during operator memory release, in bytes.

releaseTotalActive

INTEGER

Total memory allocated for the current stream during operator memory release, in bytes.

streamPtr

INTEGER

AscendCL stream address

deviceId

INTEGER

Device ID.

RANK_DEVICE_MAP

Mapping between rankId and deviceId.

There is no enable or disable option for this table. It is generated by default when the ascend_pytorch_profiler_{Rank_ID}.db file is exported.

Table 7 Format

Field

Type

Description

rankId

INTEGER

Node ID in the cluster scenario. The value -1 indicates that rankId was not set.

deviceId

INTEGER

Device ID on the node. The value -1 indicates that the device ID was not collected.

STEP_TIME

Start time of the step profiling.

It is controlled by the parameters of the torch_npu.profiler.schedule class of the Ascend PyTorch Profiler API.

Table 8 Format

Field

Type

Description

id

INTEGER

Step ID.

startNs

INTEGER

Step start time, in ns.

endNs

INTEGER

Step end time, in ns.

GC_RECORD

Saves GC events profiled by the profiler.

It is controlled by the gc_detect_threshold parameter of the Ascend PyTorch Profiler API.

Table 9 Format

Field

Type

Description

startNs

INTEGER

Start time of the GC event (ns).

endNs

INTEGER

End time of the GC event (ns)

globalTid

INTEGER

Global TID of the GC event.

ROCE

Stores the RoCE bandwidth data.

Control switch:

  • --sys-io-profiling and --sys-io-sampling-freq of the msprof command
  • sys_io of Ascend PyTorch Profiler
  • sys_io of MindSpore Profiler
Table 10 Format

Field

Type

Description

deviceId

INTEGER

Device ID.

timestampNs

INTEGER

Local time, in ns.

bandwidth

INTEGER

Bandwidth, in byte/s.

rxPacketRate

NUMERIC

Packet receiving rate, in packet/s.

rxByteRate

NUMERIC

Byte receiving rate, in byte/s.

rxPackets

INTEGER

Total number of received packets.

rxBytes

INTEGER

Total number of received bytes, in bytes.

rxErrors

INTEGER

Total number of received error packets.

rxDropped

INTEGER

Total number of lost received packets.

txPacketRate

NUMERIC

Packet sending rate, in packet/s.

txByteRate

NUMERIC

Rate of sending bytes, in byte/s.

txPackets

INTEGER

Total number of sent packets.

txBytes

INTEGER

Total number of sent bytes, in bytes.

txErrors

INTEGER

Total number of sent error packets.

txDropped

INTEGER

Total number of lost sent packets.

funcId

INTEGER

Port

NIC

Stores NIC information over time.

Control switch:

  • --sys-io-profiling and --sys-io-sampling-freq of the msprof command
  • sys_io of Ascend PyTorch Profiler
  • sys_io of MindSpore Profiler
Table 11 Format

Field

Type

Description

deviceId

INTEGER

Device ID.

timestampNs

INTEGER

Local time, in ns.

bandwidth

INTEGER

Bandwidth, in byte/s.

rxPacketRate

NUMERIC

Packet receiving rate, in packet/s.

rxByteRate

NUMERIC

Byte receiving rate, in byte/s.

rxPackets

INTEGER

Total number of received packets.

rxBytes

INTEGER

Total number of received bytes, in bytes.

rxErrors

INTEGER

Total number of received error packets.

rxDropped

INTEGER

Total number of lost received packets.

txPacketRate

NUMERIC

Packet sending rate, in packet/s.

txByteRate

NUMERIC

Rate of sending bytes, in byte/s.

txPackets

INTEGER

Total number of sent packets.

txBytes

INTEGER

Total number of sent bytes, in bytes.

txErrors

INTEGER

Total number of sent error packets.

txDropped

INTEGER

Total number of lost sent packets.

funcId

INTEGER

Port.

HCCS

Stores the HCCS bandwidth data.

Control switch

  • --sys-interconnection-profiling and --sys-interconnection-freq of the msprof command
  • sys_interconnection of Ascend PyTorch Profiler
  • sys_interconnection of MindSpore Profiler
Table 12 Format

Field

Type

Description

deviceId

INTEGER

Device ID.

timestampNs

INTEGER

Local time, in ns.

txThroughput

NUMERIC

TX bandwidth, in byte/s.

rxThroughput

NUMERIC

RX bandwidth, in byte/s.

PCIE

Stores the PCIe bandwidth data.

Control switch

  • --sys-interconnection-profiling and --sys-interconnection-freq of the msprof command
  • sys_interconnection of Ascend PyTorch Profiler
  • sys_interconnection of MindSpore Profiler
Table 13 Format

Field

Type

Description

deviceId

INTEGER

Device ID.

timestampNs

INTEGER

Local time, in ns.

txPostMin

NUMERIC

Minimum bandwidth for sending PCIe posted data at the TX side, in byte/s.

txPostMax

NUMERIC

Maximum bandwidth for sending PCIe posted data at the TX side, in byte/s.

txPostAvg

NUMERIC

Average bandwidth for sending PCIe posted data at the TX side, in byte/s.

txNonpostMin

NUMERIC

Minimum bandwidth for sending PCIe non-posted data at the TX side, in byte/s.

txNonpostMax

NUMERIC

Maximum bandwidth for sending PCIe non-posted data at the TX side, in byte/s.

txNonpostAvg

NUMERIC

Average bandwidth for sending PCIe non-posted data at the TX side, in byte/s.

txCplMin

NUMERIC

Minimum completion packet size for write requests at the TX side, in bytes/s.

txCplMax

NUMERIC

Maximum completion packet size for write requests at the TX side, in bytes/s.

txCplAvg

NUMERIC

Average completion packet size for write requests at the TX side, in bytes/s.

txNonpostLatencyMin

NUMERIC

Minimum transmission latency in PCIe non-posted mode at the TX side, in ns.

txNonpostLatencyMax

NUMERIC

Maximum transmission latency in PCIe non-posted mode at the TX side, in ns.

txNonpostLatencyAvg

NUMERIC

Average transmission latency in PCIe non-posted mode at the TX side, in ns.

rxPostMin

NUMERIC

Minimum bandwidth for sending PCIe posted data at the RX side, in byte/s.

rxPostMax

NUMERIC

Maximum bandwidth for sending PCIe posted data at the RX side, in byte/s.

rxPostAvg

NUMERIC

Average bandwidth for sending PCIe posted data at the RX side, in byte/s.

rxNonpostMin

NUMERIC

Minimum bandwidth for sending PCIe non-posted data at the RX side, in byte/s.

rxNonpostMax

NUMERIC

Maximum bandwidth for sending PCIe non-posted data at the RX side, in byte/s.

rxNonpostAvg

NUMERIC

Average bandwidth for sending PCIe non-posted data at the RX side, in byte/s.

rxCplMin

NUMERIC

Minimum completion packet size for write requests at the RX side, in bytes/s.

rxCplMax

NUMERIC

Maximum completion packet size for write requests at the RX side, in bytes/s.

rxCplAvg

NUMERIC

Average completion packet size for write requests at the RX side, in bytes/s.