TensorFlow Framework

This section describes the operator adaptation process in the TensorFlow framework, which is used to map TensorFlow operators to CANN operators (custom operators developed based on the CANN framework). In this way, CANN operators can be called from the TensorFlow framework. An example of operator calling in the TensorFlow framework is also provided to help you understand the complete process.

The following figure shows a complete development process. The key steps are as follows: First, you need to implement operators and integrate them into a graph by referring to Project-based Operator Development and Integrating Operators into a GE Graph. Then, develop the TensorFlow adaptation plugin, which is the focus of this section. This plugin is used to map TensorFlow operators to CANN operators. TensorFlow operators include TensorFlow custom operators and native operators. To map TensorFlow custom operators to CANN operators, you also need to develop TensorFlow custom operators. Finally, write the code for operator calling in the TensorFlow framework. For details about TensorFlow custom operators and TensorFlow operator calling, see the official TensorFlow documents. This section provides examples only for reference.

The procedure is as follows:

  1. Set up the environment.
    1. Install the CANN software. For details, see Environment Setup.
    2. Create an operator project. Use msOpGen to create an operator development project. In the TensorFlow operator adaptation scenario, you need to specify the framework by setting the framework parameter to tf or tensorflow. The tool automatically generates the framework adaptation code. Take the custom CANN operator AddCustom as an example. The command for creating an operator development project using the msOpGen tool is as follows:
      ${INSTALL_DIR}/python/site-packages/bin/msopgen gen -i $HOME/sample/add_custom.json -f tf -c ai_core-<soc_version> -lan cpp -out $HOME/sample/AddCustom
  2. Implement operators.
    • Define the operator prototype. The operator prototype describes the input, output, and attributes of the operator as well as the implementation information of the operator on the AI processor, and associates the operator with the function such as tiling implementation.
    • Implement the operator on the kernel and implement tiling on the host. For details, see Operator Implementation. In project-based operator development, you can call tiling APIs to perform tiling development based on the programming framework provided by CANN, and you can also call corresponding APIs on the kernel to obtain tiling parameters. For details, see Operator Implementation on the Kernel and Tiling Implementation on the Host. The additional restrictions are also described in the preceding sections.
  3. Integrate operators into a GE graph. In this scenario, the implementation of adaptation functions such as shape inference needs to be provided.
  4. Develop the TensorFlow adaptation plugin. For details, see Developing an Adaptation Plugin.
  5. Build and deploy the operators. Use the project build script to build and deploy the operators.
  6. Call operators in the TensorFlow framework. For details, see Mapping a TensorFlow Native Operator to a CANN Operator and Developing a TensorFlow Custom Operator and Mapping It to a CANN Operator. For details about the complete example, see here.

Developing an Adaptation Plugin

After creating an operator project, the framework/tf_plugin directory is generated in the operator project path to store the implementation file of the TensorFlow adaptation plugin. The following uses the custom CANN operator AddCustom as an example. The operator project directory is as follows:

AddCustom
├── build.sh             // Build script
├── cmake 
├── CMakeLists.txt       // Build script of the operator project
├── CMakePresets.json    // Build configuration options
├── framework            // Directory for storing the implementation file of the framework adaptation plugin
│   ├── tf_plugin     // Directory for storing the implementation file of the TensorFlow adaptation plugin
│   │   ├── CMakeLists.txt    
│   │   ├── tensorflow_add_custom_plugin.cc  // Implementation file of the TensorFlow adaptation plugin
│   ├── CMakeLists.txt
├── op_host                      // Implementation file on the host
├── op_kernel                    // Implementation file on the kernel
└── scripts                      // Directory of scripts used for custom operator project packing
If the prototype definition of a TensorFlow operator is the same as that of a CANN operator, the implementation code of the TensorFlow adaptation plugin is as follows:
1
2
3
4
5
6
7
#include "register/register.h"
namespace domi {
REGISTER_CUSTOM_OP("AddCustom")
    .FrameworkType(TENSORFLOW) 
    .OriginOpType("AddCustom")   
    .ParseParamsByOperatorFn(AutoMappingByOpFn);
}

If the prototype definition of a TensorFlow operator is different from that of a CANN operator, the implementation code of the TensorFlow adaptation plugin is as follows:

1
2
3
4
5
6
#include "register/register.h"
REGISTER_CUSTOM_OP("FlashAttentionScore")
    .FrameworkType(TENSORFLOW)
    .OriginOpType({"FlashAttentionScore"})
    .ParseParamsByOperatorFn(FlashAttentionScoreMapping)  
    .ParseOpToGraphFn(AddOptionalPlaceholderForFA);
  • Include the header file related to plugin implementation functions.

    register.h is stored in the include/register/ directory of the CANN component directory. After this header file is included, you can use the operator registration class to call related APIs.

  • Register a custom operator by using REGISTER_CUSTOM_OP. The value of OpType passed to the operator must be the same as that of OpType in the operator prototype registration.
    • FrameworkType: specifies the framework type. TENSORFLOW indicates that the original framework is TensorFlow.
    • OriginOpType: indicates the type of an operator in the original framework. For a TensorFlow custom operator, you also need to develop a TensorFlow custom operator. The value of OriginOpType is the same as the operator name registered in REGISTER_OP. For a TensorFlow native operator, the value is the native operator name.
    • ParseParamsByOperatorFn: parses the operator parameter to implement mapping. You need to implement the ParseParamByOpFunc callback function. If the parameters in the original TensorFlow operator correspond to those in the CANN operator, you can directly use the automatic mapping callback function AutoMappingByOpFn to implement automatic mapping.
    • ParseOpToGraphFn: parses the operator prototype mapping when the prototype definition of a TensorFlow operator is inconsistent with that of a CANN operator (for example, the CANN operator prototype has optional inputs, but the TensorFlow operator prototype does not support optional inputs).

Mapping a TensorFlow Native Operator to a CANN Operator

Take the custom operator AddCustom as an example. To map this operator to the TensorFlow built-in operator Add, modify the plugin code in the AddCustom operator directory framework/tf_plugin to complete operator name mapping.

1
2
3
4
5
6
7
#include "register/register.h"
namespace domi {
REGISTER_CUSTOM_OP("AddCustom")   // Name of the Ascend C custom operator
    .FrameworkType(TENSORFLOW)    // Third-party framework type TENSORFLOW
    .OriginOpType("Add")          // Map to the TensorFlow native operator Add
    .ParseParamsByOperatorFn(AutoMappingByOpFn);
}

After the operator project is built and deployed, construct a single-operator TensorFlow 1.15 test case for verification.

  1. Compile the test case tf_add.py.
  2. Import the Python libraries.
    1
    2
    3
    4
    import logging            # Import the logging module, which is a Python standard library.
    import tensorflow as tf   # Import the TensorFlow open-source library.
    from npu_bridge.estimator import npu_ops   # Import the npu_ops module in the TensorFlow open-source library.
    import numpy as np    # Import the Python mathematical library.
    
  3. Define the parameter for execution on the Ascend AI Processor or CPU by using config().

    If execute_type is set to ai_core, an Ascend C operator is called to run the single-operator network on the Ascend AI Processor.

    If execute_type is set to cpu, a TensorFlow operator is called to run the single-operator network on the host CPU.
     1
     2
     3
     4
     5
     6
     7
     8
     9
    10
    11
    12
    13
    14
    15
    16
    17
    def config(execute_type):
        if execute_type == 'ai_core':
            session_config = tf.ConfigProto(
                allow_soft_placement=True,
                log_device_placement=False,)
            custom_op = session_config.graph_options.rewrite_options.custom_optimizers.add()
            custom_op.name = "NpuOptimizer"
            custom_op.parameter_map["enable_data_pre_proc"].b = True   # Enable data preprocessing on the device.
            custom_op.parameter_map["mix_compile_mode"].b = True    
            custom_op.parameter_map["use_off_line"].b = True     # True indicates that training is performed on the Ascend AI Processor.
    
        elif execute_type == 'cpu':
            session_config = tf.ConfigProto(
                allow_soft_placement=True,
                log_device_placement=False)
    
        return session_config
    
  4. Define the main function of the single-operator network test case.
    • Construct the operator input based on the actual input number and shape of the operator.
    • Compute the operator output by using TensorFlow API calls based on the operator logic.
     1
     2
     3
     4
     5
     6
     7
     8
     9
    10
    11
    12
    13
    14
    15
    16
    17
    18
    19
    20
    21
    22
    23
    24
    25
    26
    27
    28
    29
    30
    31
    32
    # Set the tolerance parameters of the np.allclose comparison function.
    # Relative tolerance of the np.allclose comparison function
    atol = 0.001
    # Absolute tolerance of the np.allclose comparison function
    rtol = 0.001
    
    def main(unused_argv):
        shape_params = (8, 2048)
        dtype_params = np.float16
    
        # Construct the input data of the Add operator. shape_params indicates the shape, which is random numbers within the range [–2, +2].
        x_data = np.random.uniform(-2, 2, size=shape_params).astype(dtype_params)
        y_data = np.random.uniform(-2, 2, size=shape_params).astype(dtype_params)
       # Use placeholders for the two inputs of the Add operator.
        x = tf.compat.v1.placeholder(dtype_params, shape=shape_params)
        y = tf.compat.v1.placeholder(dtype_params, shape=shape_params)
        # Compute the output of the operator.
        out = tf.math.add(x, y)
        # Run a single-operator on the host CPU to obtain the expected execution result.
        with tf.compat.v1.Session(config=config('cpu')) as session:
            result_cpu = session.run(out, feed_dict={x: x_data, y: y_data})
        # Run a single-operator on the Ascend AI Processor to obtain the actual execution result.
        with tf.compat.v1.Session(config=config('ai_core')) as session:
            result_ai_core = session.run(out, feed_dict={x: x_data, y: y_data})
    
        np.array(result_ai_core).astype(dtype_params)
        np.array(result_cpu).astype(dtype_params)
        print('====================================')
    # Use np.allclose to compare the actual result running on the Ascend AI Processor with the expected result running on the CPU. atol and rtol are the absolute and relative tolerance parameters of the np.allclose comparison function respectively.
        cmp_result = np.allclose(result_ai_core, result_cpu, atol, rtol)
        print(cmp_result)
        print('====================================')
    
  5. Run the single-operator network.
    1
    2
    if __name__ == "__main__":
        tf.app.run()
    

Developing a TensorFlow Custom Operator and Mapping It to a CANN Operator

  1. Develop the adaptation plugin code. Take the custom operator AddCustom as an example. To map this operator to the TensorFlow custom operator AddCustom, modify the plugin code in the CANN AddCustom operator directory framework/tf_plugin to complete operator name mapping.
    1
    2
    3
    4
    REGISTER_CUSTOM_OP("AddCustom")
      .FrameworkType(TENSORFLOW)      
      .OriginOpType("AddCustom") 
      .ParseParamsByOperatorFn(AutoMappingByOpFn);
    
  2. Develop TensorFlow custom operators. Below are only examples. For details, see the TensorFlow official documentation.

    Create the TensorFlow prototype registration file custom_assign_add_custom.cc. The content is as follows:

     1
     2
     3
     4
     5
     6
     7
     8
     9
    10
    11
    12
    13
    14
    15
    16
    17
    18
    19
    20
    21
    22
    23
    24
    25
    #include "tensorflow/core/framework/op.h"
    #include "tensorflow/core/framework/shape_inference.h"
    #include "tensorflow/core/framework/op_kernel.h"
    #include "tensorflow/core/framework/common_shape_fns.h"
    using namespace tensorflow;
    
    // Register the operator prototype by using the REGISTER_OP API provided by TensorFlow.
    REGISTER_OP("AddCustom")        // TensorFlow registered operator name
        .Input("x: T")              // Operator prototype. The input parameter is x and the type is T.
        .Input("y: T")              // Operator prototype. The input parameter is y and the type is T.
        .Output("z: T")             // Operator prototype. The input parameter is z and the type is T.
        .Attr("T: {half}")          // Supported range of the T type
        .SetShapeFn(shape_inference::BroadcastBinaryOpShapeFn);  // Shape inference of the operator. BroadcastBinaryOpShapeFn is a built-in function provided by TensorFlow. The output shape is inferred from the input shape, that is, the input and output shapes are the same.
    
    // Implement a kernel function of the CPU version. During construction of a TensorFlow computational graph, the system checks whether all operators have kernel functions on any device (the NPU kernel cannot be detected). If no kernel function is found, an error is reported. Here, the CPU kernel function always returns an error.
    class AddCustomOp : public OpKernel {
     public:
      explicit AddCustomOp(OpKernelConstruction* context) : OpKernel(context) {}
    
      void Compute(OpKernelContext* context) override {
        OP_REQUIRES_OK(context, errors::Unimplemented("AddCustomOp is not supported on CPU")); 
      }
    };
    
    REGISTER_KERNEL_BUILDER(Name("AddCustom").Device(DEVICE_CPU), AddCustomOp);          // Register the CPU implementation kernel of the AddCustom operator. Currently, this function only prints a log indicating that the CPU is not supported.
    

    Run the following commands to compile the preceding code. The generated file is libcustom_ops.so. In the subsequent operator calling script, the load_op_library API can be used to load the .so file as a Python module to call the custom operator.

    TF_CFLAGS=( $(python3 -c 'import tensorflow as tf; print(" ".join(tf.sysconfig.get_compile_flags()))') )     // Obtain the TensorFlow compilation option.
    TF_LFLAGS=( $(python3 -c 'import tensorflow as tf; print(" ".join(tf.sysconfig.get_link_flags()))') )        // Obtain the TensorFlow link option.
    SOURCE_FILES=custom_assign_add_custom.cc                                                                     // Include the .cc file for TensorFlow operator registration and CPU kernel implementation.
    g++ -std=c++14 -shared $SOURCE_FILES -o ${Path}/libcustom_ops.so -fPIC ${TF_CFLAGS[@]} ${TF_LFLAGS[@]} -O2   // Compilation command. The generated file is libcustom_ops.so. For TensorFlow, the .so file can be loaded as a Python module by using load_op_library to call the custom operator.
  3. Load the dynamic library compiled in the previous step to the test script to call the custom operator.
    • TensorFlow 1.15.0 calling example
       1
       2
       3
       4
       5
       6
       7
       8
       9
      10
      11
      12
      13
      14
      15
      16
      17
      18
      19
      20
      21
      22
      23
      24
      25
      26
      27
      28
      29
      30
      31
      32
      33
      34
      35
      36
      37
      38
      39
      40
      import os
      import tensorflow as tf
      import numpy as np
      from npu_bridge.npu_init import *
      tf.enable_resource_variables()
      # Relative tolerance of the np.allclose comparison function
      atol = 0.001
      # Absolute tolerance of the np.allclose comparison function
      rtol = 0.001
      def main(unused_argv):
          custom_op_lib = tf.load_op_library('./outputs/libcustom_ops.so')     # Load the .so file as a Python module.
          shape_params = (8, 2048)
          dtype_params = np.float16
          x_data = np.random.uniform(-2, 2, size=shape_params).astype(dtype_params)
          y_data = np.random.uniform(-2, 2, size=shape_params).astype(dtype_params)
          x = tf.compat.v1.placeholder(dtype_params, shape=shape_params)
          y = tf.compat.v1.placeholder(dtype_params, shape=shape_params)
          tf_z = tf.math.add(x, y)                                           # Call the TensorFlow native operator.
          ac_z = custom_op_lib.add_custom(x, y)                              # Call the AscendC AddCustom custom operator.
          config = tf.ConfigProto()
          custom_op = config.graph_options.rewrite_options.custom_optimizers.add()
          custom_op.name = "NpuOptimizer"   # Run the single-operator on the Ascend AI Processor.
          config.graph_options.rewrite_options.remapping = RewriterConfig.OFF
          config.graph_options.rewrite_options.memory_optimization = RewriterConfig.OFF
      
          with tf.Session(config=config) as sess:
              sess.run(tf.global_variables_initializer())
              tf_golden = sess.run(tf_z, feed_dict={x: x_data, y: y_data})
          with tf.Session(config=config) as sess:
              sess.run(tf.global_variables_initializer())
              ascend_out = sess.run(ac_z, feed_dict={x: x_data, y: y_data})
          np.array(tf_golden).astype(dtype_params)
          np.array(ascend_out).astype(dtype_params)
          print('====================================')
      # Use np.allclose to compare the actual result of running on the Ascend AI Processor with the expected result of running the TensorFlow native operator. atol and rtol are the absolute and relative parameters of the np.allclose comparison function respectively.
          cmp_result = np.allclose(tf_golden, ascend_out, atol, rtol)
          print(cmp_result)
          print('====================================')
      if __name__ == "__main__":
          tf.app.run()
      
    • TensorFlow 2.6.5 calling example
       1
       2
       3
       4
       5
       6
       7
       8
       9
      10
      11
      12
      13
      14
      15
      16
      17
      18
      19
      20
      21
      22
      23
      24
      25
      26
      27
      28
      29
      30
      31
      32
      33
      34
      35
      36
      37
      38
      39
      40
      41
      42
      43
      44
      import os
      import tensorflow as tf
      import numpy as np
      import npu_device
      from npu_device.compat.v1.npu_init import *
      npu_device.compat.enable_v1()
      tf.compat.v1.enable_resource_variables()
      # Relative tolerance of the np.allclose comparison function
      atol = 0.001
      # Absolute tolerance of the np.allclose comparison function
      rtol = 0.001
      def main(unused_argv):
          custom_op_lib = tf.load_op_library('./outputs/libcustom_ops.so')     # Load the .so file as a Python module.
      
          shape_params = (8, 2048)
          dtype_params = np.float16
          x_data = np.random.uniform(-2, 2, size=shape_params).astype(dtype_params)
          y_data = np.random.uniform(-2, 2, size=shape_params).astype(dtype_params)
          x = tf.compat.v1.placeholder(dtype_params, shape=shape_params)
          y = tf.compat.v1.placeholder(dtype_params, shape=shape_params)
          tf_z = tf.math.add(x, y)                                           # Call the TensorFlow native operator.
          ac_z = custom_op_lib.add_custom(x, y)                              # Call the AscendC AddCustom custom operator.
      
          config = tf.compat.v1.ConfigProto()
          custom_op = config.graph_options.rewrite_options.custom_optimizers.add()
          custom_op.name = "NpuOptimizer"
          config.graph_options.rewrite_options.remapping = RewriterConfig.OFF
          config.graph_options.rewrite_options.memory_optimization = RewriterConfig.OFF
      
          with tf.compat.v1.Session(config=config) as sess:
              sess.run(tf.global_variables_initializer())
              tf_golden = sess.run(tf_z, feed_dict={x: x_data, y: y_data})
          with tf.compat.v1.Session(config=config) as sess:
              sess.run(tf.global_variables_initializer())
              ascend_out = sess.run(ac_z, feed_dict={x: x_data, y: y_data})
          np.array(tf_golden).astype(dtype_params)
          np.array(ascend_out).astype(dtype_params)
          print('====================================')
      # Use np.allclose to compare the actual result of running on the Ascend AI Processor with the expected result of running the TensorFlow native operator. atol and rtol are the absolute and relative parameters of the np.allclose comparison function respectively.
          cmp_result = np.allclose(tf_golden, ascend_out, atol, rtol)
          print(cmp_result)
          print('====================================')
      if __name__ == "__main__":
          tf.app.run()
      

Developing Mapping Relationships for Operators with Optional Inputs

The TensorFlow prototype definition does not support optional inputs. For operators with optional inputs, the mapping from TensorFlow to CANN does not meet the simple one-to-one mapping requirement. You need to convert the inputs to optional inputs in the plugin adaptation code to adjust the prototype mapping. The following uses the FlashAttentionScore operator in the CANN operator library as an example to describe how to develop a framework adaptation plugin for this operator.

  1. Develop an adaptation plugin.
    Different from the simple one-to-one mapping described above, ParseOpToGraphFn needs to be called to register a callback function during plugin adaptation development. The callback function is used to adjust the operator prototype mapping. In this case:
    • Register the callback function by using ParseParamsByOperatorFn. In the callback function, map the TensorFlow native operator to an intermediate operator that is consistent between IR and TensorFlow (call AutoMappingByOpFn to complete attribute mapping).
    • Register the callback function by using ParseOpToGraphFn, adjust the operator prototype mapping, and map the intermediate operator to an operator in the CANN operator library. Here, ToGraph refers to the single-operator graph consisting of only one operator.
    Note that in the ParseParamsByOperatorFn callback function, the TensorFlow operator name needs to be set in the original_type attribute of the intermediate operator to trigger the further ParseOpToGraphFn callback function. A code example is as follows:
      1
      2
      3
      4
      5
      6
      7
      8
      9
     10
     11
     12
     13
     14
     15
     16
     17
     18
     19
     20
     21
     22
     23
     24
     25
     26
     27
     28
     29
     30
     31
     32
     33
     34
     35
     36
     37
     38
     39
     40
     41
     42
     43
     44
     45
     46
     47
     48
     49
     50
     51
     52
     53
     54
     55
     56
     57
     58
     59
     60
     61
     62
     63
     64
     65
     66
     67
     68
     69
     70
     71
     72
     73
     74
     75
     76
     77
     78
     79
     80
     81
     82
     83
     84
     85
     86
     87
     88
     89
     90
     91
     92
     93
     94
     95
     96
     97
     98
     99
    100
    101
    102
    103
    104
    105
    106
    107
    108
    109
    110
    111
    112
    113
    114
    115
    116
    117
    118
    119
    120
    121
    122
    123
    124
    125
    126
    127
    128
    129
    130
    131
    132
    133
    134
    135
    136
    137
    138
    139
    140
    141
    142
    143
    144
    145
    #include <string>
    #include <vector>
    #include "register/register.h"
    #include "graph/operator.h"
    #include "graph/graph.h"
    #include "graph/operator_factory.h"
    
    namespace domi {
    using namespace ge;
    
    static Status AddOptionalPlaceholderForFA(const ge::Operator &tf_op, ge::Graph &graph) {
      // 1. Create a FlashAttentionScore operator npu_fa_op.
      ge::AscendString op_name;
      tf_op.GetName(op_name);
      auto npu_fa_op = OperatorFactory::CreateOperator(op_name.GetString(), "FlashAttentionScore");
      // 2. Map the attributes of the TensorFlow operator to the npu_fa_op operator.
      float scale_value = 1.0;
      (void)tf_op.GetAttr("scale_value", scale_value);
      (void)npu_fa_op.SetAttr("scale_value", scale_value);
    
      float keep_prob = 1.0;
      (void)tf_op.GetAttr("keep_prob", keep_prob);
      (void)npu_fa_op.SetAttr("keep_prob", keep_prob);
    
      int32_t pre_tockens = 2147483647;
      (void)tf_op.GetAttr("pre_tockens", pre_tockens);
      (void)npu_fa_op.SetAttr("pre_tockens", pre_tockens);
    
      int32_t next_tockens = 2147483647;
      (void)tf_op.GetAttr("next_tockens", next_tockens);
      (void)npu_fa_op.SetAttr("next_tockens", next_tockens);
    
      int32_t head_num = 0;
      (void)tf_op.GetAttr("head_num", head_num);
      (void)npu_fa_op.SetAttr("head_num", head_num);
    
      std::string input_layout;
      (void)tf_op.GetAttr("input_layout", input_layout);
      (void)npu_fa_op.SetAttr("input_layout", input_layout);
    
      int32_t inner_precise = 0;
      (void)tf_op.GetAttr("inner_precise", inner_precise);
      (void)npu_fa_op.SetAttr("inner_precise", inner_precise);
    
      int32_t sparse_mode = 0;
      (void)tf_op.GetAttr("sparse_mode", sparse_mode);
      (void)npu_fa_op.SetAttr("sparse_mode", sparse_mode);
    
      int32_t pse_type = 1;
      (void)tf_op.GetAttr("pse_type", pse_type);
      (void)npu_fa_op.SetAttr("pse_type", pse_type);
    
      // 3. Create input data.
      std::vector<Operator> inputs;
      for (size_t i = 0UL; i < tf_op.GetInputsSize(); i++) {
        const std::string data_name = "Data_" + std::to_string(i);
        Operator data_op = OperatorFactory::CreateOperator(data_name.c_str(), "Data");
        (void)data_op.SetAttr("index", static_cast<int32_t>(i));
        inputs.emplace_back(data_op);
      }
    
      size_t index = 0UL;
      //4. For required inputs, directly set data to the operator inputs.
      (void)npu_fa_op.SetInput("query", inputs[index++]);
      (void)npu_fa_op.SetInput("key", inputs[index++]);
      (void)npu_fa_op.SetInput("value", inputs[index++]);
    
      // 5. For optional inputs, check whether the number of type attributes is 0. If not 0, the optional inputs are enabled.
      std::vector<DataType> real_shift_type;
      (void)tf_op.GetAttr("real_shift_type", real_shift_type);
      if (!real_shift_type.empty()) {
        (void)npu_fa_op.SetInput("real_shift", inputs[index++]);
      }
    
      std::vector<DataType> drop_mask_type;
      (void)tf_op.GetAttr("drop_mask_type", drop_mask_type);
      if (!drop_mask_type.empty()) {
        (void)npu_fa_op.SetInput("drop_mask", inputs[index++]);
      }
    
      std::vector<DataType> padding_mask_type;
      (void)tf_op.GetAttr("padding_mask_type", padding_mask_type);
      if (!padding_mask_type.empty()) {
        (void)npu_fa_op.SetInput("padding_mask", inputs[index++]);
      }
      std::vector<DataType> atten_mask_type;
      (void)tf_op.GetAttr("atten_mask_type", atten_mask_type);
      if (!atten_mask_type.empty()) {
        (void)npu_fa_op.SetInput("atten_mask", inputs[index++]);
      }
      std::vector<DataType> prefix_type;
      (void)tf_op.GetAttr("prefix_type", prefix_type);
      if (!prefix_type.empty()) {
        (void)npu_fa_op.SetInput("prefix", inputs[index++]);
      }
      std::vector<DataType> actual_seq_qlen_type;
      (void)tf_op.GetAttr("actual_seq_qlen_type", actual_seq_qlen_type);
      if (!actual_seq_qlen_type.empty()) {
        (void)npu_fa_op.SetInput("actual_seq_qlen", inputs[index++]);
      }
      std::vector<DataType> actual_seq_kvlen_type;
      (void)tf_op.GetAttr("actual_seq_kvlen_type", actual_seq_kvlen_type);
      if (!actual_seq_kvlen_type.empty()) {
        (void)npu_fa_op.SetInput("actual_seq_kvlen", inputs[index++]);
      }
    
      std::vector<DataType> q_start_idx_type;
      (void)tf_op.GetAttr("q_start_idx_type", q_start_idx_type);
      if (!q_start_idx_type.empty()) {
        (void)npu_fa_op.SetInput("q_start_idx", inputs[index++]);
      }
    
      std::vector<DataType> kv_start_idx_type;
      (void)tf_op.GetAttr("kv_start_idx_type", kv_start_idx_type);
      if (!kv_start_idx_type.empty()) {
        (void)npu_fa_op.SetInput("kv_start_idx", inputs[index++]);
      }
    
      // 6. Use the output of the npu_fa_op operator to construct the graph output.
      std::vector<std::pair<Operator, std::vector<size_t>>> output_indexs;
      std::vector<size_t> node_output_index;
      for (size_t i = 0UL; i < npu_fa_op.GetOutputsSize(); i++) {
        node_output_index.emplace_back(i);
      }
      (void)output_indexs.emplace_back(std::make_pair(npu_fa_op, node_output_index));
      (void)graph.SetInputs(inputs).SetOutputs(output_indexs);
      return SUCCESS;
    }
    
    static Status FlashAttentionScoreMapping(const ge::Operator& op_src, ge::Operator& op_dst) {
      // 1. Call the default mapping function.
      if (AutoMappingByOpFn(op_src, op_dst) != ge::GRAPH_SUCCESS) {
        return FAILED;
      }
      // 2. Set the TensorFlow operator name to the original_type attribute of op_dst to trigger the ParseOpToGraphFn callback function.
      op_dst.SetAttr("original_type", "FlashAttentionScore");
      return SUCCESS;
    }
    
    REGISTER_CUSTOM_OP("FlashAttentionScore")
        .FrameworkType(TENSORFLOW)
        .OriginOpType({"FlashAttentionScore"})
        .ParseParamsByOperatorFn(FlashAttentionScoreMapping) // Register this function to implement the mapping of operator attributes.
        .ParseOpToGraphFn(AddOptionalPlaceholderForFA); // Register this function to convert the inputs in the TensorFlow into optional inputs and change the edge connection relationships.
    }  // namespace domi
    
  2. Register the prototype definition of the FlashAttentionScore operator in the TensorFlow open-source framework. Because TensorFlow does not support optional inputs, the optional inputs need to be represented as dynamic inputs in the TensorFlow prototype, and the number of dynamic inputs needs to be marked by the attribute. The optional inputs need to be placed at the end of the prototype definition.. The sample code (FlashAttentionScore.cc) is as follows:
     1
     2
     3
     4
     5
     6
     7
     8
     9
    10
    11
    12
    13
    14
    15
    16
    17
    18
    19
    20
    21
    22
    23
    24
    25
    26
    27
    28
    29
    30
    31
    32
    33
    34
    35
    36
    37
    38
    39
    40
    41
    42
    43
    44
    45
    46
    47
    48
    49
    50
    51
    52
    53
    54
    55
    56
    57
    58
    59
    60
    61
    62
    63
    64
    65
    #include <algorithm>
    #include <atomic>
    #include <map> 
    #include "tensorflow/core/framework/common_shape_fns.h"
    #include "tensorflow/core/framework/op.h"
    #include "tensorflow/core/framework/op_kernel.h" 
    using namespace tensorflow;
    using shape_inference::InferenceContext;
    using shape_inference::ShapeHandle; 
    using namespace std;
    using namespace chrono; 
    using OpKernelConstructionPtr = OpKernelConstruction*;
    using OpKernelContextPtr = OpKernelContext*;
    using InferenceContextPtr = ::tensorflow::shape_inference::InferenceContext*; 
    namespace {
    class CustOps : public OpKernel {
    public:    
         explicit CustOps(OpKernelConstructionPtr context) : OpKernel(context) {}
         void Compute(OpKernelContextPtr context) override
        {
            std::cout << "Cust Ops not installed!!" << std::endl;
        }
         ~CustOps() override = default;};
    }  // namespace 
    namespace tensorflow {
    REGISTER_OP("FlashAttentionScore")
        .Input("query: T")
        .Input("key: T")
        .Input("value: T")
        .Input("real_shift: real_shift_type")  // Register optional input as dynamic input in the TensorFlow prototype.
        .Input("drop_mask: drop_mask_type")
        .Input("padding_mask: padding_mask_type")
        .Input("atten_mask: atten_mask_type")
        .Input("prefix: prefix_type")
        .Input("actual_seq_qlen: actual_seq_qlen_type")
        .Input("actual_seq_kvlen: actual_seq_kvlen_type")
        .Input("q_start_idx: q_start_idx_type")
        .Input("kv_start_idx: kv_start_idx_type")
        .Output("softmax_max: float32")
        .Output("softmax_sum: float32")
        .Output("softmax_out: T")
        .Output("attention_out: T")
        .Attr("scale_value: float = 1.0")
        .Attr("keep_prob: float = 1.0")
        .Attr("pre_tockens: int = 2147483647")
        .Attr("next_tockens: int = 2147483647")
        .Attr("head_num: int")
        .Attr("input_layout: string")
        .Attr("inner_precise: int = 0")
        .Attr("sparse_mode: int = 0")
        .Attr("pse_type: int = 1")
        .Attr("T: {float16, float32, bfloat16} = DT_FLOAT")
        .Attr("real_shift_type: list({float16, float32, bfloat16}) >= 0") // Mark the number of dynamic inputs through the attribute.
        .Attr("drop_mask_type: list({uint8}) >= 0")
        .Attr("padding_mask_type: list({float16, float32, bfloat16}) >= 0")
        .Attr("atten_mask_type: list({bool, uint8}) >= 0")
        .Attr("prefix_type: list({int64}) >= 0")
        .Attr("actual_seq_qlen_type: list({int64}) >= 0")
        .Attr("actual_seq_kvlen_type: list({int64}) >= 0")
        .Attr("q_start_idx_type: list({int64}) >= 0")
        .Attr("kv_start_idx_type: list({int64}) >= 0")
        .SetShapeFn([](InferenceContext *c) {
          return Status::OK();
        });
    REGISTER_KERNEL_BUILDER(Name("FlashAttentionScore").Device(DEVICE_CPU), CustOps)}
    
    Run the following commands to compile the preceding code. The generated file is libcustom_ops.so. In the subsequent operator calling script, the load_op_library API can be used to load the .so file as a Python module to call the custom operator.
    TF_CFLAGS=( $(python3 -c 'import tensorflow as tf; print(" ".join(tf.sysconfig.get_compile_flags()))') )     // Obtain the TensorFlow compilation option.
    TF_LFLAGS=( $(python3 -c 'import tensorflow as tf; print(" ".join(tf.sysconfig.get_link_flags()))') )        // Obtain the TensorFlow link option.
    SOURCE_FILES=FlashAttentionScore.cc                                                                          // Include the .cc file for TensorFlow operator registration and CPU kernel implementation.
    g++ -std=c++14 -shared $SOURCE_FILES -o ${Path}/libcustom_ops.so -fPIC ${TF_CFLAGS[@]} ${TF_LFLAGS[@]} -O2   // Compilation command. The generated file is libcustom_ops.so. For TensorFlow, the .so file can be loaded as a Python module by using load_op_library to call the custom operator.
  3. Encapsulates a TensorFlow operator API for processing optional inputs. The dynamic library compiled in the previous step needs to be loaded to the script.
     1
     2
     3
     4
     5
     6
     7
     8
     9
    10
    11
    12
    13
    14
    15
    16
    17
    18
    19
    20
    21
    22
    23
    24
    25
    26
    from tensorflow.python.framework import ops
    import tensorflow as tf
    tfOpLib = tf.load_op_library("../build/tf_ops/libflashattention.so")
    
    // If optional inputs are not enabled externally, an empty list is passed to the bottom layer.
    def create_optional_input_list(input):
        input_list = []
        if not input is None:
            input_list.append(input)
        return input_list
    
    # flash_attention_score encapsulation function
    def npu_flash_attention(query, key, value, head_num, input_layout, real_shift=None, drop_mask=None, padding_mask=None,
                            atten_mask=None, prefix=None, actual_seq_qlen=None, actual_seq_kvlen=None,
                            q_start_idx=None, kv_start_idx=None, scale_value=1.0, keep_prob=1.0,
                            pre_tockens=2147483647, next_tockens=2147483647, inner_precise=0, sparse_mode=0,
                            pse_type=1):
        output = tfOpLib.flash_attention_score(query=query, key=key, value=value,
            real_shift=create_optional_input_list(real_shift), drop_mask=create_optional_input_list(drop_mask),
            padding_mask=create_optional_input_list(padding_mask), atten_mask=create_optional_input_list(atten_mask),
            prefix=create_optional_input_list(prefix), actual_seq_qlen=create_optional_input_list(actual_seq_qlen),
            actual_seq_kvlen=create_optional_input_list(actual_seq_kvlen), q_start_idx=create_optional_input_list(q_start_idx),
            kv_start_idx=create_optional_input_list(kv_start_idx), scale_value=scale_value, keep_prob=keep_prob,
            pre_tockens=pre_tockens, next_tockens=next_tockens, head_num=head_num, input_layout=input_layout,
            inner_precise=inner_precise, sparse_mode=sparse_mode, pse_type=pse_type)
        return output
    
  4. Implement the call to the custom operator in the test script. TensorFlow 2.6.5 calling code is as follows:
     1
     2
     3
     4
     5
     6
     7
     8
     9
    10
    11
    12
    13
    14
    15
    16
    17
    18
    19
    20
    21
    22
    23
    24
    25
    26
    27
    28
    29
    30
    31
    32
    33
    34
    35
    36
    37
    38
    import sys
    from ops import npu_flash_attention
    
    import tensorflow as tf
    import numpy as np
    tf.compat.v1.disable_eager_execution()
    
    import npu_device
    from npu_device.compat.v1.npu_init import *
    npu_device.compat.enable_v1()
    
    def sess_config():
        config = tf.compat.v1.ConfigProto()
        custom_op = config.graph_options.rewrite_options.custom_optimizers.add()
        custom_op.name = "NpuOptimizer"
        config.graph_options.rewrite_options.remapping = RewriterConfig.OFF
        config.graph_options.rewrite_options.memory_optimization = RewriterConfig.OFF
        return config
    
    shape = [1, 32, 32]
    query_np = np.random.randn(*shape).astype(np.float16)
    key_np = np.random.randn(*shape).astype(np.float16)
    value_np = np.random.randn(*shape).astype(np.float16)
    
    query = tf.Variable(query_np, tf.float16)
    key = tf.Variable(key_np, tf.float16)
    value = tf.Variable(value_np, tf.float16)
    
    mask = tf.zeros(shape=(shape[0], 1, shape[1], shape[1]), dtype=tf.uint8)
    
    head_num = 1
    input_layout = "BSH"
    flash_result_t = npu_flash_attention(query, key, value, head_num, input_layout, atten_mask=mask)
    
    with tf.compat.v1.Session(config=sess_config()) as sess:
        sess.run(tf.compat.v1.global_variables_initializer())
        flash_result = sess.run(flash_result_t)
        print(flash_result)
    

Developing Mapping Relationships for Operators with Dynamic Inputs

For operators with dynamic inputs or outputs, use AutoMappingByOpFnDynamic in the ParseParamByOpFunc callback function of the plugin to match TensorFlow operators to CANN operators. Use the DynamicInputOutputInfo structure class to describe the dynamic input/output information so that the dynamic input/output names are bound to the attribute names that describe the number of dynamic inputs/outputs. Then, pass the information to AutoMappingByOpFnDynamic for automatic mapping.

Take the ParseSingleExample operator as an example. The plugin adaptation code is as follows:
 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
#include "register/register.h"
namespace domi {
Status ParseSingleExampleMapping(const ge::Operator& op_src, ge::Operator& op) {
  std::vector<DynamicInputOutputInfo> value;
  const std::string dynamic_input_name_dense_defaults = "dense_defaults";
  const std::string dynamic_input_attr_name_dense_defaults = "Tdense";
  DynamicInputOutputInfo input(kInput, dynamic_input_name_dense_defaults.c_str(),
      dynamic_input_name_dense_defaults.size(), dynamic_input_attr_name_dense_defaults.c_str(),
      dynamic_input_attr_name_dense_defaults.size());
  value.push_back(input);
  const std::string dynamic_output_name_sparse_indices = "sparse_indices";
  const std::string dynamic_output_attr_name_sparse_indices = "num_sparse";
  DynamicInputOutputInfo output(kOutput, 
      dynamic_output_name_sparse_indices.c_str(),
      dynamic_output_name_sparse_indices.size(), dynamic_output_attr_name_sparse_indices.c_str(),
      dynamic_output_attr_name_sparse_indices.size());
  value.push_back(output);
  const std::string dynamic_output_name_sparse_values = "sparse_values";
  const std::string dynamic_output_attr_name_sparse_values = "sparse_types";
  DynamicInputOutputInfo output1(kOutput, 
      dynamic_output_name_sparse_values .c_str(),
      dynamic_output_name_sparse_values .size(), dynamic_output_attr_name_sparse_values.c_str(),
      dynamic_output_attr_name_sparse_values.size());
  value.push_back(output1);
  const std::string dynamic_output_name_sparse_shapes = "sparse_shapes";
  const std::string dynamic_output_attr_name_sparse_shapes = "sparse_types";
  DynamicInputOutputInfo output1(kOutput, 
      dynamic_output_name_sparse_shapes.c_str(),
      dynamic_output_name_sparse_shapes.size(), dynamic_output_attr_name_sparse_shapes.c_str(),
      dynamic_output_attr_name_sparse_shapes.size());
  value.push_back(output2);
  const std::string dynamic_output_name_dense_values = "dense_values";
  const std::string dynamic_output_attr_name_dense_values = "Tdense";
  DynamicInputOutputInfo output1(kOutput, 
      dynamic_output_name_dense_values .c_str(),
      dynamic_output_name_dense_values .size(), dynamic_output_attr_name_dense_values.c_str(),
      dynamic_output_attr_name_dense_values.size());
  value.push_back(output3);
  AutoMappingByOpFnDynamic(op_src, op, value);
  return SUCCESS;
}

// register ParseSingleExample op to GE
REGISTER_CUSTOM_OP("ParseSingleExample")
    .FrameworkType(TENSORFLOW)
    .OriginOpType("ParseSingleExample")
    .ParseParamsByOperatorFn(ParseSingleExampleMapping)
    }

Mapping of operators with both optional and dynamic inputs is not supported.