Data in ascend_pytorch_profiler_{Rank_ID}.db
This is a table schema file. You are advised to use MindStudio Insight to view the file or use a database development tool such as Navicat Premium to open the file. The profile data summarized by the current .db file is as follows:
STRING_IDS
Mapping between IDs and character strings.
There is no enable or disable option for this table. It records the mapping between strings and IDs used on CANN. Generally, the value starts accumulating from 0.
Field |
Type |
Index |
Description |
|---|---|---|---|
id |
INTEGER |
Primary key |
ID corresponding to the string. |
value |
TEXT |
- |
String content |
PYTORCH_API
API data on the framework side. Currently, only torch_npu API data is included.
It is controlled by torch_npu.profiler.ProfilerActivity.CPU of the Ascend PyTorch Profiler API.
Field |
Type |
Description |
|---|---|---|
startNs |
INTEGER |
OP API start time, in ns. |
endNs |
INTEGER |
OP API end time, in ns. |
globalTid |
INTEGER |
Global TID of the API. High-order 32 bits: PID; low-order 32 bits: TID |
connectionId |
INTEGER |
Used to query the connection ID in the CONNECTION_IDS table. If the connection ID does not exist, this field is left empty. |
name |
INTEGER |
Name of the OP API, STRING_IDS(name). |
sequenceNumber |
INTEGER |
OP No. |
fwdThreadId |
INTEGER |
ID of the OP forward thread. |
inputDtypes |
INTEGER |
Input data type, STRING_IDS(inputDtypes). |
inputShapes |
INTEGER |
Input shape, STRING_IDS(inputShapes) |
callchainId |
INTEGER |
This field is used to query the call stack information in the PYTORCH_CALLCHAINS table. If no stack information is available, this field is left empty. |
type |
INTEGER |
Data type, which can be op, queue, mstx, or python_trace. The data type is stored in the enumeration table ENUM_API_TYPE. |
CONNECTION_IDS
Association between framework APIs or between framework APIs and CANN APIs.
It is controlled by torch_npu.profiler.ProfilerActivity.CPU of the Ascend PyTorch Profiler API.
Field |
Type |
Description |
|---|---|---|
id |
INTEGER |
Corresponds to the connectionId field in the PYTORCH_API table . |
connectionId |
INTEGER |
Association ID. Currently, the association can be including task_queue, fwd_bwd, or torch-cann-task. |
PYTORCH_CALLCHAINS
Stack information on the framework side.
It is controlled by the with_stack parameter of the Ascend PyTorch Profiler API.
Field |
Type |
Description |
|---|---|---|
id |
INTEGER |
Corresponds to the callchainId field in the PYTORCH_API table . |
stack |
INTEGER |
ID of the string content of the current stack in the STRING_IDS table |
stackDepth |
INTEGER |
Depth of the current stack. |
MEMORY_RECORD
Device memory usage records on the framework side.
It is controlled by the profile_memory parameter of the Ascend PyTorch Profiler API.
Field |
Type |
Description |
|---|---|---|
component |
INTEGER |
ID of the component name (GE, PTA, or PTA+GE) in the STRING_IDS table |
timestamp |
INTEGER |
Timestamp. |
totalAllocated |
INTEGER |
Total allocated memory. |
totalReserved |
INTEGER |
Total reserved memory. |
totalActive |
INTEGER |
Total memory allocated to the PTA flow. |
streamPtr |
INTEGER |
AscendCL stream address |
deviceId |
INTEGER |
Device ID. |
OP_MEMORY
Operator memory usage information integrated based on MEMORY_RECORD on the framework side.
It is controlled by the profile_memory parameter of the Ascend PyTorch Profiler API.
Field |
Type |
Description |
|---|---|---|
name |
INTEGER |
torch and GE operator name, STRING_IDS(name). |
size |
INTEGER |
Size of the memory occupied by the operator, in bytes. |
allocationTime |
INTEGER |
Operator memory allocation time, in ns. |
releaseTime |
INTEGER |
Operator memory release time, in ns. |
activeReleaseTime |
INTEGER |
Actual time when the memory is returned to the memory pool, in ns. |
duration |
INTEGER |
Memory occupation time, in ns. |
activeDuration |
INTEGER |
Actual memory occupation time, in ns. |
allocationTotalAllocated |
INTEGER |
Total memory allocated to PTA and GE during operator memory allocation, in bytes. |
allocationTotalReserved |
INTEGER |
Total memory occupied by PTA and GE during operator memory allocation, in bytes. |
allocationTotalActive |
INTEGER |
Total memory allocated for the current stream during operator memory allocation, in bytes. |
releaseTotalAllocated |
INTEGER |
Total memory allocated to PTA and GE during operator memory release, in bytes. |
releaseTotalReserved |
INTEGER |
Total memory occupied by PTA and GE during operator memory release, in bytes. |
releaseTotalActive |
INTEGER |
Total memory allocated for the current stream during operator memory release, in bytes. |
streamPtr |
INTEGER |
AscendCL stream address |
deviceId |
INTEGER |
Device ID. |
RANK_DEVICE_MAP
Mapping between rankId and deviceId.
There is no enable or disable option for this table. It is generated by default when the ascend_pytorch_profiler_{Rank_ID}.db file is exported.
Field |
Type |
Description |
|---|---|---|
rankId |
INTEGER |
Node ID in the cluster scenario. The value -1 indicates that rankId was not set. |
deviceId |
INTEGER |
Device ID on the node. The value -1 indicates that the device ID was not collected. |
STEP_TIME
Start time of the step profiling.
It is controlled by the parameters of the torch_npu.profiler.schedule class of the Ascend PyTorch Profiler API.
Field |
Type |
Description |
|---|---|---|
id |
INTEGER |
Step ID. |
startNs |
INTEGER |
Step start time, in ns. |
endNs |
INTEGER |
Step end time, in ns. |
GC_RECORD
Saves GC events profiled by the profiler.
It is controlled by the gc_detect_threshold parameter of the Ascend PyTorch Profiler API.
Field |
Type |
Description |
|---|---|---|
startNs |
INTEGER |
Start time of the GC event (ns). |
endNs |
INTEGER |
End time of the GC event (ns) |
globalTid |
INTEGER |
Global TID of the GC event. |
ROCE
Stores the RoCE bandwidth data.
Control switch:
- --sys-io-profiling and --sys-io-sampling-freq of the msprof command
- sys_io of Ascend PyTorch Profiler
- sys_io of MindSpore Profiler
Field |
Type |
Description |
|---|---|---|
deviceId |
INTEGER |
Device ID. |
timestampNs |
INTEGER |
Local time, in ns. |
bandwidth |
INTEGER |
Bandwidth, in byte/s. |
rxPacketRate |
NUMERIC |
Packet receiving rate, in packet/s. |
rxByteRate |
NUMERIC |
Byte receiving rate, in byte/s. |
rxPackets |
INTEGER |
Total number of received packets. |
rxBytes |
INTEGER |
Total number of received bytes, in bytes. |
rxErrors |
INTEGER |
Total number of received error packets. |
rxDropped |
INTEGER |
Total number of lost received packets. |
txPacketRate |
NUMERIC |
Packet sending rate, in packet/s. |
txByteRate |
NUMERIC |
Rate of sending bytes, in byte/s. |
txPackets |
INTEGER |
Total number of sent packets. |
txBytes |
INTEGER |
Total number of sent bytes, in bytes. |
txErrors |
INTEGER |
Total number of sent error packets. |
txDropped |
INTEGER |
Total number of lost sent packets. |
funcId |
INTEGER |
Port |
NIC
Stores NIC information over time.
Control switch:
- --sys-io-profiling and --sys-io-sampling-freq of the msprof command
- sys_io of Ascend PyTorch Profiler
- sys_io of MindSpore Profiler
Field |
Type |
Description |
|---|---|---|
deviceId |
INTEGER |
Device ID. |
timestampNs |
INTEGER |
Local time, in ns. |
bandwidth |
INTEGER |
Bandwidth, in byte/s. |
rxPacketRate |
NUMERIC |
Packet receiving rate, in packet/s. |
rxByteRate |
NUMERIC |
Byte receiving rate, in byte/s. |
rxPackets |
INTEGER |
Total number of received packets. |
rxBytes |
INTEGER |
Total number of received bytes, in bytes. |
rxErrors |
INTEGER |
Total number of received error packets. |
rxDropped |
INTEGER |
Total number of lost received packets. |
txPacketRate |
NUMERIC |
Packet sending rate, in packet/s. |
txByteRate |
NUMERIC |
Rate of sending bytes, in byte/s. |
txPackets |
INTEGER |
Total number of sent packets. |
txBytes |
INTEGER |
Total number of sent bytes, in bytes. |
txErrors |
INTEGER |
Total number of sent error packets. |
txDropped |
INTEGER |
Total number of lost sent packets. |
funcId |
INTEGER |
Port. |
HCCS
Stores the HCCS bandwidth data.
Control switch
- --sys-interconnection-profiling and --sys-interconnection-freq of the msprof command
- sys_interconnection of Ascend PyTorch Profiler
- sys_interconnection of MindSpore Profiler
Field |
Type |
Description |
|---|---|---|
deviceId |
INTEGER |
Device ID. |
timestampNs |
INTEGER |
Local time, in ns. |
txThroughput |
NUMERIC |
TX bandwidth, in byte/s. |
rxThroughput |
NUMERIC |
RX bandwidth, in byte/s. |
PCIE
Stores the PCIe bandwidth data.
Control switch
- --sys-interconnection-profiling and --sys-interconnection-freq of the msprof command
- sys_interconnection of Ascend PyTorch Profiler
- sys_interconnection of MindSpore Profiler
Field |
Type |
Description |
|---|---|---|
deviceId |
INTEGER |
Device ID. |
timestampNs |
INTEGER |
Local time, in ns. |
txPostMin |
NUMERIC |
Minimum bandwidth for sending PCIe posted data at the TX side, in byte/s. |
txPostMax |
NUMERIC |
Maximum bandwidth for sending PCIe posted data at the TX side, in byte/s. |
txPostAvg |
NUMERIC |
Average bandwidth for sending PCIe posted data at the TX side, in byte/s. |
txNonpostMin |
NUMERIC |
Minimum bandwidth for sending PCIe non-posted data at the TX side, in byte/s. |
txNonpostMax |
NUMERIC |
Maximum bandwidth for sending PCIe non-posted data at the TX side, in byte/s. |
txNonpostAvg |
NUMERIC |
Average bandwidth for sending PCIe non-posted data at the TX side, in byte/s. |
txCplMin |
NUMERIC |
Minimum completion packet size for write requests at the TX side, in bytes/s. |
txCplMax |
NUMERIC |
Maximum completion packet size for write requests at the TX side, in bytes/s. |
txCplAvg |
NUMERIC |
Average completion packet size for write requests at the TX side, in bytes/s. |
txNonpostLatencyMin |
NUMERIC |
Minimum transmission latency in PCIe non-posted mode at the TX side, in ns. |
txNonpostLatencyMax |
NUMERIC |
Maximum transmission latency in PCIe non-posted mode at the TX side, in ns. |
txNonpostLatencyAvg |
NUMERIC |
Average transmission latency in PCIe non-posted mode at the TX side, in ns. |
rxPostMin |
NUMERIC |
Minimum bandwidth for sending PCIe posted data at the RX side, in byte/s. |
rxPostMax |
NUMERIC |
Maximum bandwidth for sending PCIe posted data at the RX side, in byte/s. |
rxPostAvg |
NUMERIC |
Average bandwidth for sending PCIe posted data at the RX side, in byte/s. |
rxNonpostMin |
NUMERIC |
Minimum bandwidth for sending PCIe non-posted data at the RX side, in byte/s. |
rxNonpostMax |
NUMERIC |
Maximum bandwidth for sending PCIe non-posted data at the RX side, in byte/s. |
rxNonpostAvg |
NUMERIC |
Average bandwidth for sending PCIe non-posted data at the RX side, in byte/s. |
rxCplMin |
NUMERIC |
Minimum completion packet size for write requests at the RX side, in bytes/s. |
rxCplMax |
NUMERIC |
Maximum completion packet size for write requests at the RX side, in bytes/s. |
rxCplAvg |
NUMERIC |
Average completion packet size for write requests at the RX side, in bytes/s. |