TensorFlow Framework

This section describes the operator adaptation process in the TensorFlow framework, which is used to map TensorFlow operators to CANN operators (custom operators developed based on the CANN framework). In this way, CANN operators can be called from the TensorFlow framework. An example of operator calling in the TensorFlow framework is also provided to help you understand the complete process.

The following figure shows a complete development process. The key steps are as follows: First, you need to implement operators and integrate them into a graph by referring to Project-based Operator Development and Integrating Operators into a GE Graph. Then, develop the TensorFlow adaptation plugin, which is the focus of this section. This plugin is used to map TensorFlow operators to CANN operators. TensorFlow operators include TensorFlow custom operators and native operators. To map TensorFlow custom operators to CANN operators, you also need to develop TensorFlow custom operators. Finally, write the code for operator calling in the TensorFlow framework. For details about TensorFlow custom operators and TensorFlow operator calling, see the official TensorFlow documents. This section provides examples only for reference.

The procedure is as follows:

Set up the environment.
1. Install the CANN software. For details, see Environment Setup.
2. Create an operator project. Use msOpGen to create an operator development project. In the TensorFlow operator adaptation scenario, you need to specify the framework by setting the framework parameter to tf or tensorflow. The tool automatically generates the framework adaptation code. Take the custom CANN operator AddCustom as an example. The command for creating an operator development project using the msOpGen tool is as follows:
```
${INSTALL_DIR}/python/site-packages/bin/msopgen gen -i $HOME/sample/add_custom.json -f tf -c ai_core-<soc_version> -lan cpp -out $HOME/sample/AddCustom
```
Implement operators.
- Define the operator prototype. The operator prototype describes the input, output, and attributes of the operator as well as the implementation information of the operator on the AI processor, and associates the operator with the function such as tiling implementation.
- Implement the operator on the kernel and implement tiling on the host. For details, see Operator Implementation. In project-based operator development, you can call tiling APIs to perform tiling development based on the programming framework provided by CANN, and you can also call corresponding APIs on the kernel to obtain tiling parameters. For details, see Operator Implementation on the Kernel and Tiling Implementation on the Host. The additional restrictions are also described in the preceding sections.
Integrate operators into a GE graph. In this scenario, the implementation of adaptation functions such as shape inference needs to be provided.
Develop the TensorFlow adaptation plugin. For details, see Developing an Adaptation Plugin.
Build and deploy the operators. Use the project build script to build and deploy the operators.
Call operators in the TensorFlow framework. For details, see Mapping a TensorFlow Native Operator to a CANN Operator and Developing a TensorFlow Custom Operator and Mapping It to a CANN Operator. For details about the complete example, see here.

Developing an Adaptation Plugin

After creating an operator project, the framework/tf_plugin directory is generated in the operator project path to store the implementation file of the TensorFlow adaptation plugin. The following uses the custom CANN operator AddCustom as an example. The operator project directory is as follows:

AddCustom
├── build.sh             // Build script
├── cmake 
├── CMakeLists.txt       // Build script of the operator project
├── CMakePresets.json    // Build configuration options
├── framework            // Directory for storing the implementation file of the framework adaptation plugin
│   ├── tf_plugin     // Directory for storing the implementation file of the TensorFlow adaptation plugin
│   │   ├── CMakeLists.txt    
│   │   ├── tensorflow_add_custom_plugin.cc  // Implementation file of the TensorFlow adaptation plugin
│   ├── CMakeLists.txt
├── op_host                      // Implementation file on the host
├── op_kernel                    // Implementation file on the kernel
└── scripts                      // Directory of scripts used for custom operator project packing

If the prototype definition of a TensorFlow operator is the same as that of a CANN operator, the implementation code of the TensorFlow adaptation plugin is as follows:

       
            #include "register/register.h"
namespace domi {
REGISTER_CUSTOM_OP("AddCustom")
    .FrameworkType(TENSORFLOW) 
    .OriginOpType("AddCustom")   
    .ParseParamsByOperatorFn(AutoMappingByOpFn);
}

If the prototype definition of a TensorFlow operator is different from that of a CANN operator, the implementation code of the TensorFlow adaptation plugin is as follows:

      
           #include "register/register.h"
REGISTER_CUSTOM_OP("FlashAttentionScore")
    .FrameworkType(TENSORFLOW)
    .OriginOpType({"FlashAttentionScore"})
    .ParseParamsByOperatorFn(FlashAttentionScoreMapping)  
    .ParseOpToGraphFn(AddOptionalPlaceholderForFA);

Include the header file related to plugin implementation functions.
register.h is stored in the include/register/ directory of the CANN component directory. After this header file is included, you can use the operator registration class to call related APIs.
Register a custom operator by using REGISTER_CUSTOM_OP. The value of OpType passed to the operator must be the same as that of OpType in the operator prototype registration.
- FrameworkType: specifies the framework type. TENSORFLOW indicates that the original framework is TensorFlow.
- OriginOpType: indicates the type of an operator in the original framework. For a TensorFlow custom operator, you also need to develop a TensorFlow custom operator. The value of OriginOpType is the same as the operator name registered in REGISTER_OP. For a TensorFlow native operator, the value is the native operator name.
- ParseParamsByOperatorFn: parses the operator parameter to implement mapping. You need to implement the ParseParamByOpFunc callback function. If the parameters in the original TensorFlow operator correspond to those in the CANN operator, you can directly use the automatic mapping callback function AutoMappingByOpFn to implement automatic mapping.
- ParseOpToGraphFn: parses the operator prototype mapping when the prototype definition of a TensorFlow operator is inconsistent with that of a CANN operator (for example, the CANN operator prototype has optional inputs, but the TensorFlow operator prototype does not support optional inputs).

Mapping a TensorFlow Native Operator to a CANN Operator

Take the custom operator AddCustom as an example. To map this operator to the TensorFlow built-in operator Add, modify the plugin code in the AddCustom operator directory framework/tf_plugin to complete operator name mapping.

      
           #include "register/register.h"
namespace domi {
REGISTER_CUSTOM_OP("AddCustom")   // Name of the Ascend C custom operator
    .FrameworkType(TENSORFLOW)    // Third-party framework type TENSORFLOW
    .OriginOpType("Add")          // Map to the TensorFlow native operator Add
    .ParseParamsByOperatorFn(AutoMappingByOpFn);
}

After the operator project is built and deployed, construct a single-operator TensorFlow 1.15 test case for verification.

Compile the test case tf_add.py.

Import the Python libraries.

        
             import logging            # Import the logging module, which is a Python standard library.
import tensorflow as tf   # Import the TensorFlow open-source library.
from npu_bridge.estimator import npu_ops   # Import the npu_ops module in the TensorFlow open-source library.
import numpy as np    # Import the Python mathematical library.

Define the parameter for execution on the Ascend AI Processor or CPU by using config().

If execute_type is set to ai_core, an Ascend C operator is called to run the single-operator network on the Ascend AI Processor.

If execute_type is set to cpu, a TensorFlow operator is called to run the single-operator network on the host CPU.

         
              def config(execute_type):
    if execute_type == 'ai_core':
        session_config = tf.ConfigProto(
            allow_soft_placement=True,
            log_device_placement=False,)
        custom_op = session_config.graph_options.rewrite_options.custom_optimizers.add()
        custom_op.name = "NpuOptimizer"
        custom_op.parameter_map["enable_data_pre_proc"].b = True   # Enable data preprocessing on the device.
        custom_op.parameter_map["mix_compile_mode"].b = True    
        custom_op.parameter_map["use_off_line"].b = True     # True indicates that training is performed on the Ascend AI Processor.

    elif execute_type == 'cpu':
        session_config = tf.ConfigProto(
            allow_soft_placement=True,
            log_device_placement=False)

    return session_config

Define the main function of the single-operator network test case.

Construct the operator input based on the actual input number and shape of the operator.
Compute the operator output by using TensorFlow API calls based on the operator logic.

        
         
           
           
             # Set the tolerance parameters of the np.allclose comparison function.
# Relative tolerance of the np.allclose comparison function
atol = 0.001
# Absolute tolerance of the np.allclose comparison function
rtol = 0.001

def main(unused_argv):
    shape_params = (8, 2048)
    dtype_params = np.float16

    # Construct the input data of the Add operator. shape_params indicates the shape, which is random numbers within the range [–2, +2].
    x_data = np.random.uniform(-2, 2, size=shape_params).astype(dtype_params)
    y_data = np.random.uniform(-2, 2, size=shape_params).astype(dtype_params)
   # Use placeholders for the two inputs of the Add operator.
    x = tf.compat.v1.placeholder(dtype_params, shape=shape_params)
    y = tf.compat.v1.placeholder(dtype_params, shape=shape_params)
    # Compute the output of the operator.
    out = tf.math.add(x, y)
    # Run a single-operator on the host CPU to obtain the expected execution result.
    with tf.compat.v1.Session(config=config('cpu')) as session:
        result_cpu = session.run(out, feed_dict={x: x_data, y: y_data})
    # Run a single-operator on the Ascend AI Processor to obtain the actual execution result.
    with tf.compat.v1.Session(config=config('ai_core')) as session:
        result_ai_core = session.run(out, feed_dict={x: x_data, y: y_data})

    np.array(result_ai_core).astype(dtype_params)
    np.array(result_cpu).astype(dtype_params)
    print('====================================')
# Use np.allclose to compare the actual result running on the Ascend AI Processor with the expected result running on the CPU. atol and rtol are the absolute and relative tolerance parameters of the np.allclose comparison function respectively.
    cmp_result = np.allclose(result_ai_core, result_cpu, atol, rtol)
    print(cmp_result)
    print('====================================')

            

          

        
       

Run the single-operator network.

        
             if __name__ == "__main__":
    tf.app.run()

Developing a TensorFlow Custom Operator and Mapping It to a CANN Operator

Develop the adaptation plugin code. Take the custom operator AddCustom as an example. To map this operator to the TensorFlow custom operator AddCustom, modify the plugin code in the CANN AddCustom operator directory framework/tf_plugin to complete operator name mapping.

        
             REGISTER_CUSTOM_OP("AddCustom")
  .FrameworkType(TENSORFLOW)      
  .OriginOpType("AddCustom") 
  .ParseParamsByOperatorFn(AutoMappingByOpFn);

Develop TensorFlow custom operators. Below are only examples. For details, see the TensorFlow official documentation.

Create the TensorFlow prototype registration file custom_assign_add_custom.cc. The content is as follows:

        
             #include "tensorflow/core/framework/op.h"
#include "tensorflow/core/framework/shape_inference.h"
#include "tensorflow/core/framework/op_kernel.h"
#include "tensorflow/core/framework/common_shape_fns.h"
using namespace tensorflow;

// Register the operator prototype by using the REGISTER_OP API provided by TensorFlow.
REGISTER_OP("AddCustom")        // TensorFlow registered operator name
    .Input("x: T")              // Operator prototype. The input parameter is x and the type is T.
    .Input("y: T")              // Operator prototype. The input parameter is y and the type is T.
    .Output("z: T")             // Operator prototype. The input parameter is z and the type is T.
    .Attr("T: {half}")          // Supported range of the T type
    .SetShapeFn(shape_inference::BroadcastBinaryOpShapeFn);  // Shape inference of the operator. BroadcastBinaryOpShapeFn is a built-in function provided by TensorFlow. The output shape is inferred from the input shape, that is, the input and output shapes are the same.

// Implement a kernel function of the CPU version. During construction of a TensorFlow computational graph, the system checks whether all operators have kernel functions on any device (the NPU kernel cannot be detected). If no kernel function is found, an error is reported. Here, the CPU kernel function always returns an error.
class AddCustomOp : public OpKernel {
 public:
  explicit AddCustomOp(OpKernelConstruction* context) : OpKernel(context) {}

  void Compute(OpKernelContext* context) override {
    OP_REQUIRES_OK(context, errors::Unimplemented("AddCustomOp is not supported on CPU")); 
  }
};

REGISTER_KERNEL_BUILDER(Name("AddCustom").Device(DEVICE_CPU), AddCustomOp);          // Register the CPU implementation kernel of the AddCustom operator. Currently, this function only prints a log indicating that the CPU is not supported.

Run the following commands to compile the preceding code. The generated file is libcustom_ops.so. In the subsequent operator calling script, the load_op_library API can be used to load the .so file as a Python module to call the custom operator.

TF_CFLAGS=( $(python3 -c 'import tensorflow as tf; print(" ".join(tf.sysconfig.get_compile_flags()))') )     // Obtain the TensorFlow compilation option.
TF_LFLAGS=( $(python3 -c 'import tensorflow as tf; print(" ".join(tf.sysconfig.get_link_flags()))') )        // Obtain the TensorFlow link option.
SOURCE_FILES=custom_assign_add_custom.cc                                                                     // Include the .cc file for TensorFlow operator registration and CPU kernel implementation.
g++ -std=c++14 -shared $SOURCE_FILES -o ${Path}/libcustom_ops.so -fPIC ${TF_CFLAGS[@]} ${TF_LFLAGS[@]} -O2   // Compilation command. The generated file is libcustom_ops.so. For TensorFlow, the .so file can be loaded as a Python module by using load_op_library to call the custom operator.

Load the dynamic library compiled in the previous step to the test script to call the custom operator.

TensorFlow 1.15.0 calling example

          
           
             
             
               import os
import tensorflow as tf
import numpy as np
from npu_bridge.npu_init import *
tf.enable_resource_variables()
# Relative tolerance of the np.allclose comparison function
atol = 0.001
# Absolute tolerance of the np.allclose comparison function
rtol = 0.001
def main(unused_argv):
    custom_op_lib = tf.load_op_library('./outputs/libcustom_ops.so')     # Load the .so file as a Python module.
    shape_params = (8, 2048)
    dtype_params = np.float16
    x_data = np.random.uniform(-2, 2, size=shape_params).astype(dtype_params)
    y_data = np.random.uniform(-2, 2, size=shape_params).astype(dtype_params)
    x = tf.compat.v1.placeholder(dtype_params, shape=shape_params)
    y = tf.compat.v1.placeholder(dtype_params, shape=shape_params)
    tf_z = tf.math.add(x, y)                                           # Call the TensorFlow native operator.
    ac_z = custom_op_lib.add_custom(x, y)                              # Call the AscendC AddCustom custom operator.
    config = tf.ConfigProto()
    custom_op = config.graph_options.rewrite_options.custom_optimizers.add()
    custom_op.name = "NpuOptimizer"   # Run the single-operator on the Ascend AI Processor.
    config.graph_options.rewrite_options.remapping = RewriterConfig.OFF
    config.graph_options.rewrite_options.memory_optimization = RewriterConfig.OFF

    with tf.Session(config=config) as sess:
        sess.run(tf.global_variables_initializer())
        tf_golden = sess.run(tf_z, feed_dict={x: x_data, y: y_data})
    with tf.Session(config=config) as sess:
        sess.run(tf.global_variables_initializer())
        ascend_out = sess.run(ac_z, feed_dict={x: x_data, y: y_data})
    np.array(tf_golden).astype(dtype_params)
    np.array(ascend_out).astype(dtype_params)
    print('====================================')
# Use np.allclose to compare the actual result of running on the Ascend AI Processor with the expected result of running the TensorFlow native operator. atol and rtol are the absolute and relative parameters of the np.allclose comparison function respectively.
    cmp_result = np.allclose(tf_golden, ascend_out, atol, rtol)
    print(cmp_result)
    print('====================================')
if __name__ == "__main__":
    tf.app.run()

              

            

          
         

TensorFlow 2.6.5 calling example

          
           
             
             
               import os
import tensorflow as tf
import numpy as np
import npu_device
from npu_device.compat.v1.npu_init import *
npu_device.compat.enable_v1()
tf.compat.v1.enable_resource_variables()
# Relative tolerance of the np.allclose comparison function
atol = 0.001
# Absolute tolerance of the np.allclose comparison function
rtol = 0.001
def main(unused_argv):
    custom_op_lib = tf.load_op_library('./outputs/libcustom_ops.so')     # Load the .so file as a Python module.

    shape_params = (8, 2048)
    dtype_params = np.float16
    x_data = np.random.uniform(-2, 2, size=shape_params).astype(dtype_params)
    y_data = np.random.uniform(-2, 2, size=shape_params).astype(dtype_params)
    x = tf.compat.v1.placeholder(dtype_params, shape=shape_params)
    y = tf.compat.v1.placeholder(dtype_params, shape=shape_params)
    tf_z = tf.math.add(x, y)                                           # Call the TensorFlow native operator.
    ac_z = custom_op_lib.add_custom(x, y)                              # Call the AscendC AddCustom custom operator.

    config = tf.compat.v1.ConfigProto()
    custom_op = config.graph_options.rewrite_options.custom_optimizers.add()
    custom_op.name = "NpuOptimizer"
    config.graph_options.rewrite_options.remapping = RewriterConfig.OFF
    config.graph_options.rewrite_options.memory_optimization = RewriterConfig.OFF

    with tf.compat.v1.Session(config=config) as sess:
        sess.run(tf.global_variables_initializer())
        tf_golden = sess.run(tf_z, feed_dict={x: x_data, y: y_data})
    with tf.compat.v1.Session(config=config) as sess:
        sess.run(tf.global_variables_initializer())
        ascend_out = sess.run(ac_z, feed_dict={x: x_data, y: y_data})
    np.array(tf_golden).astype(dtype_params)
    np.array(ascend_out).astype(dtype_params)
    print('====================================')
# Use np.allclose to compare the actual result of running on the Ascend AI Processor with the expected result of running the TensorFlow native operator. atol and rtol are the absolute and relative parameters of the np.allclose comparison function respectively.
    cmp_result = np.allclose(tf_golden, ascend_out, atol, rtol)
    print(cmp_result)
    print('====================================')
if __name__ == "__main__":
    tf.app.run()

              

            

          
         

Developing Mapping Relationships for Operators with Optional Inputs

The TensorFlow prototype definition does not support optional inputs. For operators with optional inputs, the mapping from TensorFlow to CANN does not meet the simple one-to-one mapping requirement. You need to convert the inputs to optional inputs in the plugin adaptation code to adjust the prototype mapping. The following uses the FlashAttentionScore operator in the CANN operator library as an example to describe how to develop a framework adaptation plugin for this operator.

Develop an adaptation plugin.

Different from the simple one-to-one mapping described above, ParseOpToGraphFn needs to be called to register a callback function during plugin adaptation development. The callback function is used to adjust the operator prototype mapping. In this case:

Register the callback function by using ParseParamsByOperatorFn. In the callback function, map the TensorFlow native operator to an intermediate operator that is consistent between IR and TensorFlow (call AutoMappingByOpFn to complete attribute mapping).
Register the callback function by using ParseOpToGraphFn, adjust the operator prototype mapping, and map the intermediate operator to an operator in the CANN operator library. Here, ToGraph refers to the single-operator graph consisting of only one operator.

Note that in the ParseParamsByOperatorFn callback function, the TensorFlow operator name needs to be set in the original_type attribute of the intermediate operator to trigger the further ParseOpToGraphFn callback function. A code example is as follows:

         
          
            
            
              #include <string>
#include <vector>
#include "register/register.h"
#include "graph/operator.h"
#include "graph/graph.h"
#include "graph/operator_factory.h"

namespace domi {
using namespace ge;

static Status AddOptionalPlaceholderForFA(const ge::Operator &tf_op, ge::Graph &graph) {
  // 1. Create a FlashAttentionScore operator npu_fa_op.
  ge::AscendString op_name;
  tf_op.GetName(op_name);
  auto npu_fa_op = OperatorFactory::CreateOperator(op_name.GetString(), "FlashAttentionScore");
  // 2. Map the attributes of the TensorFlow operator to the npu_fa_op operator.
  float scale_value = 1.0;
  (void)tf_op.GetAttr("scale_value", scale_value);
  (void)npu_fa_op.SetAttr("scale_value", scale_value);

  float keep_prob = 1.0;
  (void)tf_op.GetAttr("keep_prob", keep_prob);
  (void)npu_fa_op.SetAttr("keep_prob", keep_prob);

  int32_t pre_tockens = 2147483647;
  (void)tf_op.GetAttr("pre_tockens", pre_tockens);
  (void)npu_fa_op.SetAttr("pre_tockens", pre_tockens);

  int32_t next_tockens = 2147483647;
  (void)tf_op.GetAttr("next_tockens", next_tockens);
  (void)npu_fa_op.SetAttr("next_tockens", next_tockens);

  int32_t head_num = 0;
  (void)tf_op.GetAttr("head_num", head_num);
  (void)npu_fa_op.SetAttr("head_num", head_num);

  std::string input_layout;
  (void)tf_op.GetAttr("input_layout", input_layout);
  (void)npu_fa_op.SetAttr("input_layout", input_layout);

  int32_t inner_precise = 0;
  (void)tf_op.GetAttr("inner_precise", inner_precise);
  (void)npu_fa_op.SetAttr("inner_precise", inner_precise);

  int32_t sparse_mode = 0;
  (void)tf_op.GetAttr("sparse_mode", sparse_mode);
  (void)npu_fa_op.SetAttr("sparse_mode", sparse_mode);

  int32_t pse_type = 1;
  (void)tf_op.GetAttr("pse_type", pse_type);
  (void)npu_fa_op.SetAttr("pse_type", pse_type);

  // 3. Create input data.
  std::vector<Operator> inputs;
  for (size_t i = 0UL; i < tf_op.GetInputsSize(); i++) {
    const std::string data_name = "Data_" + std::to_string(i);
    Operator data_op = OperatorFactory::CreateOperator(data_name.c_str(), "Data");
    (void)data_op.SetAttr("index", static_cast<int32_t>(i));
    inputs.emplace_back(data_op);
  }

  size_t index = 0UL;
  //4. For required inputs, directly set data to the operator inputs.
  (void)npu_fa_op.SetInput("query", inputs[index++]);
  (void)npu_fa_op.SetInput("key", inputs[index++]);
  (void)npu_fa_op.SetInput("value", inputs[index++]);

  // 5. For optional inputs, check whether the number of type attributes is 0. If not 0, the optional inputs are enabled.
  std::vector<DataType> real_shift_type;
  (void)tf_op.GetAttr("real_shift_type", real_shift_type);
  if (!real_shift_type.empty()) {
    (void)npu_fa_op.SetInput("real_shift", inputs[index++]);
  }

  std::vector<DataType> drop_mask_type;
  (void)tf_op.GetAttr("drop_mask_type", drop_mask_type);
  if (!drop_mask_type.empty()) {
    (void)npu_fa_op.SetInput("drop_mask", inputs[index++]);
  }

  std::vector<DataType> padding_mask_type;
  (void)tf_op.GetAttr("padding_mask_type", padding_mask_type);
  if (!padding_mask_type.empty()) {
    (void)npu_fa_op.SetInput("padding_mask", inputs[index++]);
  }
  std::vector<DataType> atten_mask_type;
  (void)tf_op.GetAttr("atten_mask_type", atten_mask_type);
  if (!atten_mask_type.empty()) {
    (void)npu_fa_op.SetInput("atten_mask", inputs[index++]);
  }
  std::vector<DataType> prefix_type;
  (void)tf_op.GetAttr("prefix_type", prefix_type);
  if (!prefix_type.empty()) {
    (void)npu_fa_op.SetInput("prefix", inputs[index++]);
  }
  std::vector<DataType> actual_seq_qlen_type;
  (void)tf_op.GetAttr("actual_seq_qlen_type", actual_seq_qlen_type);
  if (!actual_seq_qlen_type.empty()) {
    (void)npu_fa_op.SetInput("actual_seq_qlen", inputs[index++]);
  }
  std::vector<DataType> actual_seq_kvlen_type;
  (void)tf_op.GetAttr("actual_seq_kvlen_type", actual_seq_kvlen_type);
  if (!actual_seq_kvlen_type.empty()) {
    (void)npu_fa_op.SetInput("actual_seq_kvlen", inputs[index++]);
  }

  std::vector<DataType> q_start_idx_type;
  (void)tf_op.GetAttr("q_start_idx_type", q_start_idx_type);
  if (!q_start_idx_type.empty()) {
    (void)npu_fa_op.SetInput("q_start_idx", inputs[index++]);
  }

  std::vector<DataType> kv_start_idx_type;
  (void)tf_op.GetAttr("kv_start_idx_type", kv_start_idx_type);
  if (!kv_start_idx_type.empty()) {
    (void)npu_fa_op.SetInput("kv_start_idx", inputs[index++]);
  }

  // 6. Use the output of the npu_fa_op operator to construct the graph output.
  std::vector<std::pair<Operator, std::vector<size_t>>> output_indexs;
  std::vector<size_t> node_output_index;
  for (size_t i = 0UL; i < npu_fa_op.GetOutputsSize(); i++) {
    node_output_index.emplace_back(i);
  }
  (void)output_indexs.emplace_back(std::make_pair(npu_fa_op, node_output_index));
  (void)graph.SetInputs(inputs).SetOutputs(output_indexs);
  return SUCCESS;
}

static Status FlashAttentionScoreMapping(const ge::Operator& op_src, ge::Operator& op_dst) {
  // 1. Call the default mapping function.
  if (AutoMappingByOpFn(op_src, op_dst) != ge::GRAPH_SUCCESS) {
    return FAILED;
  }
  // 2. Set the TensorFlow operator name to the original_type attribute of op_dst to trigger the ParseOpToGraphFn callback function.
  op_dst.SetAttr("original_type", "FlashAttentionScore");
  return SUCCESS;
}

REGISTER_CUSTOM_OP("FlashAttentionScore")
    .FrameworkType(TENSORFLOW)
    .OriginOpType({"FlashAttentionScore"})
    .ParseParamsByOperatorFn(FlashAttentionScoreMapping) // Register this function to implement the mapping of operator attributes.
    .ParseOpToGraphFn(AddOptionalPlaceholderForFA); // Register this function to convert the inputs in the TensorFlow into optional inputs and change the edge connection relationships.
}  // namespace domi

             

           

         
        

Register the prototype definition of the FlashAttentionScore operator in the TensorFlow open-source framework. Because TensorFlow does not support optional inputs, the optional inputs need to be represented as dynamic inputs in the TensorFlow prototype, and the number of dynamic inputs needs to be marked by the attribute. The optional inputs need to be placed at the end of the prototype definition.. The sample code (FlashAttentionScore.cc) is as follows:

        
         
           
           
             #include <algorithm>
#include <atomic>
#include <map> 
#include "tensorflow/core/framework/common_shape_fns.h"
#include "tensorflow/core/framework/op.h"
#include "tensorflow/core/framework/op_kernel.h" 
using namespace tensorflow;
using shape_inference::InferenceContext;
using shape_inference::ShapeHandle; 
using namespace std;
using namespace chrono; 
using OpKernelConstructionPtr = OpKernelConstruction*;
using OpKernelContextPtr = OpKernelContext*;
using InferenceContextPtr = ::tensorflow::shape_inference::InferenceContext*; 
namespace {
class CustOps : public OpKernel {
public:    
     explicit CustOps(OpKernelConstructionPtr context) : OpKernel(context) {}
     void Compute(OpKernelContextPtr context) override
    {
        std::cout << "Cust Ops not installed!!" << std::endl;
    }
     ~CustOps() override = default;};
}  // namespace 
namespace tensorflow {
REGISTER_OP("FlashAttentionScore")
    .Input("query: T")
    .Input("key: T")
    .Input("value: T")
    .Input("real_shift: real_shift_type")  // Register optional input as dynamic input in the TensorFlow prototype.
    .Input("drop_mask: drop_mask_type")
    .Input("padding_mask: padding_mask_type")
    .Input("atten_mask: atten_mask_type")
    .Input("prefix: prefix_type")
    .Input("actual_seq_qlen: actual_seq_qlen_type")
    .Input("actual_seq_kvlen: actual_seq_kvlen_type")
    .Input("q_start_idx: q_start_idx_type")
    .Input("kv_start_idx: kv_start_idx_type")
    .Output("softmax_max: float32")
    .Output("softmax_sum: float32")
    .Output("softmax_out: T")
    .Output("attention_out: T")
    .Attr("scale_value: float = 1.0")
    .Attr("keep_prob: float = 1.0")
    .Attr("pre_tockens: int = 2147483647")
    .Attr("next_tockens: int = 2147483647")
    .Attr("head_num: int")
    .Attr("input_layout: string")
    .Attr("inner_precise: int = 0")
    .Attr("sparse_mode: int = 0")
    .Attr("pse_type: int = 1")
    .Attr("T: {float16, float32, bfloat16} = DT_FLOAT")
    .Attr("real_shift_type: list({float16, float32, bfloat16}) >= 0") // Mark the number of dynamic inputs through the attribute.
    .Attr("drop_mask_type: list({uint8}) >= 0")
    .Attr("padding_mask_type: list({float16, float32, bfloat16}) >= 0")
    .Attr("atten_mask_type: list({bool, uint8}) >= 0")
    .Attr("prefix_type: list({int64}) >= 0")
    .Attr("actual_seq_qlen_type: list({int64}) >= 0")
    .Attr("actual_seq_kvlen_type: list({int64}) >= 0")
    .Attr("q_start_idx_type: list({int64}) >= 0")
    .Attr("kv_start_idx_type: list({int64}) >= 0")
    .SetShapeFn([](InferenceContext *c) {
      return Status::OK();
    });
REGISTER_KERNEL_BUILDER(Name("FlashAttentionScore").Device(DEVICE_CPU), CustOps)}

            

          

        
       

TF_CFLAGS=( $(python3 -c 'import tensorflow as tf; print(" ".join(tf.sysconfig.get_compile_flags()))') )     // Obtain the TensorFlow compilation option.
TF_LFLAGS=( $(python3 -c 'import tensorflow as tf; print(" ".join(tf.sysconfig.get_link_flags()))') )        // Obtain the TensorFlow link option.
SOURCE_FILES=FlashAttentionScore.cc                                                                          // Include the .cc file for TensorFlow operator registration and CPU kernel implementation.
g++ -std=c++14 -shared $SOURCE_FILES -o ${Path}/libcustom_ops.so -fPIC ${TF_CFLAGS[@]} ${TF_LFLAGS[@]} -O2   // Compilation command. The generated file is libcustom_ops.so. For TensorFlow, the .so file can be loaded as a Python module by using load_op_library to call the custom operator.

Encapsulates a TensorFlow operator API for processing optional inputs. The dynamic library compiled in the previous step needs to be loaded to the script.

        
         
           
           
             from tensorflow.python.framework import ops
import tensorflow as tf
tfOpLib = tf.load_op_library("../build/tf_ops/libflashattention.so")

// If optional inputs are not enabled externally, an empty list is passed to the bottom layer.
def create_optional_input_list(input):
    input_list = []
    if not input is None:
        input_list.append(input)
    return input_list

# flash_attention_score encapsulation function
def npu_flash_attention(query, key, value, head_num, input_layout, real_shift=None, drop_mask=None, padding_mask=None,
                        atten_mask=None, prefix=None, actual_seq_qlen=None, actual_seq_kvlen=None,
                        q_start_idx=None, kv_start_idx=None, scale_value=1.0, keep_prob=1.0,
                        pre_tockens=2147483647, next_tockens=2147483647, inner_precise=0, sparse_mode=0,
                        pse_type=1):
    output = tfOpLib.flash_attention_score(query=query, key=key, value=value,
        real_shift=create_optional_input_list(real_shift), drop_mask=create_optional_input_list(drop_mask),
        padding_mask=create_optional_input_list(padding_mask), atten_mask=create_optional_input_list(atten_mask),
        prefix=create_optional_input_list(prefix), actual_seq_qlen=create_optional_input_list(actual_seq_qlen),
        actual_seq_kvlen=create_optional_input_list(actual_seq_kvlen), q_start_idx=create_optional_input_list(q_start_idx),
        kv_start_idx=create_optional_input_list(kv_start_idx), scale_value=scale_value, keep_prob=keep_prob,
        pre_tockens=pre_tockens, next_tockens=next_tockens, head_num=head_num, input_layout=input_layout,
        inner_precise=inner_precise, sparse_mode=sparse_mode, pse_type=pse_type)
    return output

            

          

        
       

Implement the call to the custom operator in the test script. TensorFlow 2.6.5 calling code is as follows:

        
         
           
           
             import sys
from ops import npu_flash_attention

import tensorflow as tf
import numpy as np
tf.compat.v1.disable_eager_execution()

import npu_device
from npu_device.compat.v1.npu_init import *
npu_device.compat.enable_v1()

def sess_config():
    config = tf.compat.v1.ConfigProto()
    custom_op = config.graph_options.rewrite_options.custom_optimizers.add()
    custom_op.name = "NpuOptimizer"
    config.graph_options.rewrite_options.remapping = RewriterConfig.OFF
    config.graph_options.rewrite_options.memory_optimization = RewriterConfig.OFF
    return config

shape = [1, 32, 32]
query_np = np.random.randn(*shape).astype(np.float16)
key_np = np.random.randn(*shape).astype(np.float16)
value_np = np.random.randn(*shape).astype(np.float16)

query = tf.Variable(query_np, tf.float16)
key = tf.Variable(key_np, tf.float16)
value = tf.Variable(value_np, tf.float16)

mask = tf.zeros(shape=(shape[0], 1, shape[1], shape[1]), dtype=tf.uint8)

head_num = 1
input_layout = "BSH"
flash_result_t = npu_flash_attention(query, key, value, head_num, input_layout, atten_mask=mask)

with tf.compat.v1.Session(config=sess_config()) as sess:
    sess.run(tf.compat.v1.global_variables_initializer())
    flash_result = sess.run(flash_result_t)
    print(flash_result)

            

          

        
       

Developing Mapping Relationships for Operators with Dynamic Inputs

For operators with dynamic inputs or outputs, use AutoMappingByOpFnDynamic in the ParseParamByOpFunc callback function of the plugin to match TensorFlow operators to CANN operators. Use the DynamicInputOutputInfo structure class to describe the dynamic input/output information so that the dynamic input/output names are bound to the attribute names that describe the number of dynamic inputs/outputs. Then, pass the information to AutoMappingByOpFnDynamic for automatic mapping.

Take the ParseSingleExample operator as an example. The plugin adaptation code is as follows:

       
        
          
          
            #include "register/register.h"
namespace domi {
Status ParseSingleExampleMapping(const ge::Operator& op_src, ge::Operator& op) {
  std::vector<DynamicInputOutputInfo> value;
  const std::string dynamic_input_name_dense_defaults = "dense_defaults";
  const std::string dynamic_input_attr_name_dense_defaults = "Tdense";
  DynamicInputOutputInfo input(kInput, dynamic_input_name_dense_defaults.c_str(),
      dynamic_input_name_dense_defaults.size(), dynamic_input_attr_name_dense_defaults.c_str(),
      dynamic_input_attr_name_dense_defaults.size());
  value.push_back(input);
  const std::string dynamic_output_name_sparse_indices = "sparse_indices";
  const std::string dynamic_output_attr_name_sparse_indices = "num_sparse";
  DynamicInputOutputInfo output(kOutput, 
      dynamic_output_name_sparse_indices.c_str(),
      dynamic_output_name_sparse_indices.size(), dynamic_output_attr_name_sparse_indices.c_str(),
      dynamic_output_attr_name_sparse_indices.size());
  value.push_back(output);
  const std::string dynamic_output_name_sparse_values = "sparse_values";
  const std::string dynamic_output_attr_name_sparse_values = "sparse_types";
  DynamicInputOutputInfo output1(kOutput, 
      dynamic_output_name_sparse_values .c_str(),
      dynamic_output_name_sparse_values .size(), dynamic_output_attr_name_sparse_values.c_str(),
      dynamic_output_attr_name_sparse_values.size());
  value.push_back(output1);
  const std::string dynamic_output_name_sparse_shapes = "sparse_shapes";
  const std::string dynamic_output_attr_name_sparse_shapes = "sparse_types";
  DynamicInputOutputInfo output1(kOutput, 
      dynamic_output_name_sparse_shapes.c_str(),
      dynamic_output_name_sparse_shapes.size(), dynamic_output_attr_name_sparse_shapes.c_str(),
      dynamic_output_attr_name_sparse_shapes.size());
  value.push_back(output2);
  const std::string dynamic_output_name_dense_values = "dense_values";
  const std::string dynamic_output_attr_name_dense_values = "Tdense";
  DynamicInputOutputInfo output1(kOutput, 
      dynamic_output_name_dense_values .c_str(),
      dynamic_output_name_dense_values .size(), dynamic_output_attr_name_dense_values.c_str(),
      dynamic_output_attr_name_dense_values.size());
  value.push_back(output3);
  AutoMappingByOpFnDynamic(op_src, op, value);
  return SUCCESS;
}

// register ParseSingleExample op to GE
REGISTER_CUSTOM_OP("ParseSingleExample")
    .FrameworkType(TENSORFLOW)
    .OriginOpType("ParseSingleExample")
    .ParseParamsByOperatorFn(ParseSingleExampleMapping)
    }

           

         

       
      

Mapping of operators with both optional and dynamic inputs is not supported.

Parent topic: AI Framework Operator Adaptation