TensorFlow和PyTorch的性能采集

TensorFlow下没有接口可以直接调用，需要使用msprof命令进行采集，一般使用动态采集，方便控制采集数据量；详细请参考《CANN 性能调优工具用户指南》中的“动态采集性能数据”。

示例代码如下：

import npu_device
from npu_device.compat.v1.npu_init import *
import numpy as np
import tensorflow as tf
tf.compat.v1.disable_eager_execution()
session_config = tf.compat.v1.ConfigProto()
custom_op = session_config.graph_options.rewrite_options.custom_optimizers.add()
custom_op.name = "NpuOptimizer"
custom_op.parameter_map["graph_max_parallel_model_num"].i = 1
custom_op.parameter_map["aicore_num"].s = tf.compat.as_bytes("7|10")
session_config.graph_options.rewrite_options.remapping = RewriterConfig.OFF
left_shape = [1, 8000]
right_shape = [800, 1]
x = tf.compat.v1.placeholder(tf.int64, shape=left_shape)
y = tf.compat.v1.placeholder(tf.int64, shape=right_shape)
equal_ret = tf.math.equal(x, y)
inputs_x = np.random.rand(*left_shape)
inputs_x = inputs_x.astype(np.int64)
inputs_y = np.random.rand(*right_shape)
inputs_y = inputs_y.astype(np.int64)
with tf.compat.v1.Session(config=session_config) as sess:
  for i in range(100000):
    result = sess.run(equal_ret, feed_dict={x:inputs_x, y:inputs_y})
    
print(result)

需要在循环运行推理，然后再模型开始推理一小段时间后，自行获取运行程序的pid，比如本次为9527，则运行如下命令动态采集命令采集数据
```
msprof --dynamic=on --pid=9527 --output=/home/projects/output --model-execution=on --runtime-api=on --aicpu=on
> start
    
...
> stop
   
...
> quit
```
其中start命令之后，为动态采集的时间窗，到输入stop命令时结束采集；
采集出数据后，还需要手动解析，进入到上一步采集的目录（一般是一个带有时间戳的目录），使用以下命令解析数据
```
// 启用解析并将profiling输出到当前目录
msprof --parse=on --output=./
// 启用导出，并将结果以CSV格式保存到当前目录
msprof --export=on --output=. --summary-format=csv
```
采集时间过长，解析时间会很长，需要适当控制采集时间，一般采集5s就可以进行数据分析。

父主题： msprof工具介绍