开发者可通过compat.v1模块调用TF Adapter 1.x中的Profiler类,从而实现局部采集性能数据的功能,即仅Profiler类作用域下的命令才会开启性能数据采集功能。
关于Profiler类的详细介绍可参见Profiler构造函数。
下面介绍如何通过compat.v1模块调用TF Adapter 1.x的Profiler类实现采集局部性能数据的功能。
1 2 3 |
import npu_device from npu_device.compat.v1.npu_init import * npu_device.compat.enable_v1() |
1 2 3 4 5 6 7 8 |
a = tf.placeholder(tf.int32, (None,None)) b = tf.constant([[1,2],[2,3]], dtype=tf.int32, shape=(2,2)) c = tf.placeholder(tf.int32, (None,None)) add = tf.add(a, b) with tf.compat.v1.Session(config=session_config, graph=g) as sess: with profiler.Profiler(level="L1", aic_metrics="ArithmeticUtilization", output_path = "./"): result=sess.run(add, feed_dict={a: [[-20, 2],[1,3]],c: [[1],[-21]]}) |
当前,开发者可以采集指定step的性能数据,只需要将指定的step操作定义在相应的Proflier作用域内即可,如下所示:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 |
a=tf.placeholder(tf.int32, (None,None)) b=tf.constant([[1,2],[2,3]], dtype=tf.int32, shape=(2,2)) c = tf.placeholder(tf.int32, (None,None)) d = tf.constant([[1,2],[2,3]], dtype=tf.int32, shape=(2,2)) add1 = tf.add(a, b) add2 = tf.add(c, d) add3 = tf.add(add1, add2) with tf.compat.v1.Session(config=session_config, graph=g) as sess: with profiler.Profiler(level="L1", aic_metrics="PipeUtilization", output_path = "/home/test/profiling_data"): for i in range(2): result=sess.run(add1, feed_dict={a: [[-20],[1]]}) with profiler.Profiler(level="L1", aic_metrics="ArithmeticUtilization", output_path = "/home/test/profiling_data"): for i in range(4): result=sess.run(add3, feed_dict={a: [[-20, 2],[1,3]],c: [[1],[-21]]}) |
使用Profiler类时需要注意以下约束:
1 2 3 |
with profiler.Profiler(level="L1", aic_metrics="ArithmeticUtilization", output_path = "./"): with profiler.Profiler(level="L1", aic_metrics="ArithmeticUtilization", output_path = "./"): sess.run(add) |