Implementation

Overview

This section describes how to verify an operator by constructing a single-operator network containing only that operator.

Verification Procedure

Figure 1 Verification procedure
  1. Develop an operator and deploy related deliverables on the host, including the operator implementation file, operator plugin (required only when the test frontend is the TensorFlow network), operator prototype library and operator information library.
  2. Select test cases to run and specify shape and dtype.
  3. Use NumPy or other tools to generate random data as the operator input data.
  4. Run your operator in the TensorFlow environment or using a NumPy simulator on the generated random input to obtain the expected compute result.
  5. Use the TensorFlow frontend to build a single-operator network that contains only the developed TBE operator, and use the data in 3 as the input to execute the single-operator network.
    TensorFlow calls GE or FE and executes the single-operator network in the hardware environment of Ascend AI Processor to obtain the actual operator execution result. The procedure is as follows:
    1. Adapt the operator using TF Adapter and upload the plugin. For a TensorFlow frontend, the TF Adapter API is called to construct a TensorFlow graph, which will be sent to GE and loaded during GE initialization. The operator plugin is invoked to parse the operator and map it to a graph supported by Ascend AI Processor.
    2. GE calls infershape() and verify() in the operator prototype library to perform shape inference and attribute validation.
    3. FE loads the operator information registered with the operator information library, which specifies the valid input parameters (including the shape, type, and format) for running the operator on Ascend AI Processor.
    4. FE calls the operator in the TBE operator library to perform UB fusion and build the operator implementation file to generate the operator kernel.
    5. GE delivers the task info obtained from FE to Runtime, and then sends it to the device. Then, run the single-operator network on Ascend AI Processor to obtain the actual compute result of the operator.
    6. Compare the actual compute result obtained on Ascend AI Processor with the expected compute result obtained in 4 to check the functionality and accuracy of your operator.

Procedure (TensorFlow 1.15)

The following uses the Add operator as an example to describe how to construct and verify a single-operator network in TensorFlow 1.15. Go to the Ascend samples repository on Gitee or GitHub and download the sample package that matches the required version. For the version mapping, see "Release Notes" in the README file. Find the sample in the cplusplus/level1_single_api/4_op_dev/1_custom_op/tbe/testcases/tf1.15_test/add directory.

  1. Go to the tbe/testcases/tf1.15_test/opname directory and create a test case definition file (for example, tf_add.py).
  2. Import the Python libraries.
    1
    2
    3
    4
    import logging            # Imports the log module, a Python standard library.
    import tensorflow as tf   # Imports the TensorFlow open-source library.
    from npu_bridge.estimator import npu_ops   # Imports the npu_ops module in the TensorFlow open-source library.
    import numpy as np    # Imports the numerical Python library.
    
  3. Set the tolerance parameters of the np.allclose comparison function.
    1
    2
    3
    4
    # Relative tolerance of the np.allclose comparison function
    atol = 0.001
    # Absolute tolerance of the np.allclose comparison function
    rtol = 0.001
    
  4. Define the running parameters of the Ascend AI Processor and CPU by using config().

    If execute_type is set to ai_core, the operator that is run as a single-operator network on Ascend AI Processor is a TBE operator.

    If execute_type is set to cpu, the operator that is run as a single-operator network on the host CPU is a TensorFlow operator.
     1
     2
     3
     4
     5
     6
     7
     8
     9
    10
    11
    12
    13
    14
    15
    16
    17
    def config(execute_type):
        if execute_type == 'ai_core':
            session_config = tf.ConfigProto(
                allow_soft_placement=True,
                log_device_placement=False,)
            custom_op = session_config.graph_options.rewrite_options.custom_optimizers.add()
            custom_op.name = "NpuOptimizer"
            custom_op.parameter_map["enable_data_pre_proc"].b = True   # Enable data preprocessing on the device.
            custom_op.parameter_map["mix_compile_mode"].b = True    
            custom_op.parameter_map["use_off_line"].b = True     # True indicates that training is performed on Ascend AI Processor.
            
        elif execute_type == 'cpu':
            session_config = tf.ConfigProto(
                allow_soft_placement=True,
                log_device_placement=False)
    
        return session_config
    
  5. Define the main function of the single-operator network test case.
     1
     2
     3
     4
     5
     6
     7
     8
     9
    10
    11
    12
    13
    14
    15
    16
    17
    18
    19
    20
    21
    22
    23
    24
    25
    26
    def main(unused_argv):
        shape_params = (2, 2, 2)
        dtype_params = np.float16
    
        # Construct the input data of the Add operator. shape_params indicates the shape, which is random numbers within the range [–2, +2].
        x_data = np.random.uniform(-2, 2, size=shape_params).astype(dtype_params)
        y_data = np.random.uniform(-2, 2, size=shape_params).astype(dtype_params)
       # Placeholders the input data of the Add operator.
        x = tf.compat.v1.placeholder(dtype_params, shape=shape_params)
        y = tf.compat.v1.placeholder(dtype_params, shape=shape_params)
        # Compute the output of the operator.
        out = tf.math.add(x, y)
        # Run a single-operator on the host CPU to obtain the expected execution result.
        with tf.compat.v1.Session(config=config('cpu')) as session:
            result_cpu = session.run(out, feed_dict={x: x_data, y: y_data})
        # Run a single-operator on Ascend AI Processor to obtain the actual execution result.
        with tf.compat.v1.Session(config=config('ai_core')) as session:
            result_ai_core = session.run(out, feed_dict={x: x_data, y: y_data})
    
        np.array(result_ai_core).astype(dtype_params)
        np.array(result_cpu).astype(dtype_params)
        print('====================================')
       # Compare the actual running result on Ascend AI Processor with the expected running result on the CPU by running the np.allclose command. atol and rtol are the relative tolerance and absolute tolerance parameters of the np.allclose comparison function, respectively. For details, see Step 3.
        cmp_result = np.allclose(result_ai_core, result_cpu, atol, rtol)
        print(cmp_result)
        print('====================================')
    
    • Construct the operator input based on the actual input number and shape of the operator.
    • Compute the operator output by using TensorFlow API calls based on the operator logic.
  6. Run the single-operator network.
    1
    2
    if __name__ == "__main__":
        tf.app.run()
    

Procedure (TensorFlow 2.6)

The following uses the Add operator as an example to describe how to construct and verify a single-operator network in TensorFlow 2.6. Go to the Ascend samples repository on Gitee or GitHub and download the sample package that matches the required version. For the version mapping, see "Release Notes" in the README file. Find the sample in the cplusplus/level1_single_api/4_op_dev/1_custom_op/tbe/testcases/tf2.6_test/add directory.

  1. Go to the tbe/testcases/tf2.6_test/opname directory and create a test case definition file (for example, tf_add.py).
  2. Import the Python libraries.
    1
    2
    3
    4
    import logging            # Imports the log module, a Python standard library.
    import tensorflow as tf   # Imports the TensorFlow open-source library.
    from npu_device.compat.v1 import *    # Import NPU-related libraries.
    import numpy as np    # Imports the Python mathematical library.
    
  3. Disable TensorFlow 2.x behavior and set the tolerance parameter of the np.allclose comparison function.
    1
    2
    3
    4
    5
    6
    # Disable TensorFlow 2.x behavior.
    tf.compat.v1.disable_v2_behavior()
    # Relative tolerance of the np.allclose comparison function
    atol = 0.001
    # Absolute tolerance of the np.allclose comparison function
    rtol = 0.001
    
  4. Define the running parameters of the Ascend AI Processor and CPU by using config().

    If execute_type is set to ai_core, the operator that is run as a single-operator network on Ascend AI Processor is a TBE operator.

    If execute_type is set to cpu, the operator that is run as a single-operator network on the host CPU is a TensorFlow operator.
     1
     2
     3
     4
     5
     6
     7
     8
     9
    10
    11
    12
    13
    14
    15
    16
    17
    18
    def config(execute_type):
        if execute_type == 'ai_core':
            session_config = tf.compat.v1.ConfigProto(
                allow_soft_placement=True,
                log_device_placement=False,)
            custom_op = session_config.graph_options.rewrite_options.custom_optimizers.add()
            custom_op.name = "NpuOptimizer"
            custom_op.parameter_map["enable_data_pre_proc"].b = True   # Enable data preprocessing on the device.
            custom_op.parameter_map["mix_compile_mode"].b = True    
            custom_op.parameter_map["use_off_line"].b = True     # True indicates that training is performed on Ascend AI Processor.
            custom_op.parameter_map["min_group_size"].b = 1
            
        elif execute_type == 'cpu':
            session_config = tf.compat.v1.ConfigProto(
                allow_soft_placement=True,
                log_device_placement=False)
    
        return session_config
    
  5. Define the main function of the single-operator network test case.
     1
     2
     3
     4
     5
     6
     7
     8
     9
    10
    11
    12
    13
    14
    15
    16
    17
    18
    19
    20
    21
    22
    23
    24
    25
    26
    def main(unused_argv):
        shape_params = (2, 2, 2)
        dtype_params = np.float16
    
        # Construct the input data of the Add operator. shape_params indicates the shape, which is random numbers within the range [–2, +2].
        x_data = np.random.uniform(-2, 2, size=shape_params).astype(dtype_params)
        y_data = np.random.uniform(-2, 2, size=shape_params).astype(dtype_params)
       # Placeholders the input data of the Add operator.
        x = tf.compat.v1.placeholder(dtype_params, shape=shape_params)
        y = tf.compat.v1.placeholder(dtype_params, shape=shape_params)
        # Compute the output of the operator.
        out = tf.math.add(x, y)
        # Run a single-operator on the host CPU to obtain the expected execution result.
        with tf.compat.v1.Session(config=config('cpu')) as session:
            result_cpu = session.run(out, feed_dict={x: x_data, y: y_data})
        # Run a single-operator on Ascend AI Processor to obtain the actual execution result.
        with tf.compat.v1.Session(config=config('ai_core')) as session:
            result_ai_core = session.run(out, feed_dict={x: x_data, y: y_data})
    
        np.array(result_ai_core).astype(dtype_params)
        np.array(result_cpu).astype(dtype_params)
        print('====================================')
       # Compare the actual running result on Ascend AI Processor with the expected running result on the CPU by running the np.allclose command. atol and rtol are the relative tolerance and absolute tolerance parameters of the np.allclose comparison function, respectively. For details, see Step 3.
        cmp_result = np.allclose(result_ai_core, result_cpu, atol, rtol)
        print(cmp_result)
        print('====================================')
    
    • Construct the operator input based on the actual input number and shape of the operator.
    • Compute the operator output by using TensorFlow API calls based on the operator logic.
  6. Run the single-operator network.
    1
    2
    if __name__ == "__main__":
        tf.compat.v1.app.run()