API List
TF Adapter provides APIs for users to develop training or online inference scripts based on the deep learning framework TensorFlow 1.15.

API path: ${TFPLUGIN_INSTALL_PATH}/python/site-packages/npu_bridge
Category |
API |
Description |
|---|---|---|
Session configuration |
TF Adapter provides a series of session configurations for function debugging, performance improvement, and precision improvement. Developers can use these session configurations when performing model training or online inference on the Ascend AI Processor. |
|
npu.npu_config |
When performing model training or online inference in Estimator mode on the Ascend AI Processor, you can use the constructor of the NPURunConfig class to specify the running configuration of the Estimator. |
|
Configures the profiling function. |
||
Configures the system memory usage mode. |
||
Configures the dump function. |
||
Configures extended parameters for debugging. This API may change in later versions and is not supported for use in commercial products. |
||
npu.npu_estimator |
Constructor of the NPUEstimator class. The NPUEstimator class inherits the Estimator class of TensorFlow and can call the native APIs of the base class for the training, evaluation, and inference of TensorFlow models. |
|
Constructor of the NPUEstimatorSpec class. The NPUEstimatorSpec class inherits the EstimatorSpec class of the TensorFlow and can call the native APIs of the base class to define specific model objects. |
||
npu_strategy |
Constructs an object of class NPUStrategy. NPUStrategy inherits the tf.distribute.Strategy class and can call the native APIs of the base class to implement distributed training in the NPU environment. |
|
npu_hook |
Constructs an object of class NPUCheckpointSaverHook, which is used to save the checkpoint file. The NPUCheckpointSaverHook class inherits the CheckpointSaverHook class and can call the native APIs of the base class to record model information during training. |
|
Constructs an object of class NPUOutputTensorHook. NPUOutputTensorHook is a hook for training, evaluation, and prediction of NPUEstimator, and it can call the user-defined output_fn every N steps or at the end to print the output tensors. The NPUOutputTensorHook class inherits the LoggingTensorHook class and can call native APIs of the base class. |
||
Constructs an object of class TellMeStepOrLossHook, which is used to notify the bottom-layer software of the serial number of the current step and the total number of steps or the current loss and the target loss. |
||
npu_optimizer |
Constructs an object of class NPUDistributedOptimizer, which wraps around a single-server training optimizer to an NPU distributed training optimizer. |
|
Constructs an object of class NPUOptimizer, which combines the NPUDistributedOptimizer and NPULossScaleOptimizer optimizers. It provides the following functions:
|
||
Constructs an object of class KerasDistributeOptimizer, which wraps around the single-server training optimizer constructed by tf.Keras to an NPU distributed training optimizer. |
||
Adds the AllReduce operation of NPU to the input gradient function of the optimizer, combines them into one function, and returns the optimizer. This API is used only in distributed scenarios. |
||
Performs AllReduce and update operations on gradients after the gradient computing is complete. |
||
npu_callbacks |
Broadcasts variables in Keras scenarios to ensure that the initial values of variables on each device are the same in distributed scenarios. |
|
npu_bridge.estimator.npu.npu_loss_scale_optimizer |
Constructs an object of class NPULossScaleOptimizer, which is used to enable loss scaling in mixed precision training when the overflow/underflow mode of floating-point computation is saturation mode. Loss scaling solves the underflow problem caused by the small float16 representation range. |
|
npu.npu_loss_scale_manager |
Constructs an object of class FixedLossScaleManager, which is used to define the static LossScale parameter during training when the overflow/underflow mode of floating-point computation is saturation mode. |
|
Constructs an object of class ExponentialUpdateLossScaleManager, which is used to define the dynamic LossScale parameter during training and dynamically obtain and update the value of LossScale by defining the loss_scale variable when the overflow/underflow mode of floating-point computation is saturation mode. |
||
npu_ops |
It has the same functionality as tf.nn.dropout. Elements of the input tensor are randomly set to zero with a probability of 1 – keep_prob. The remaining elements are scaled by a factor of 1/keep_prob to ensure that the output tensor maintains the same shape as the input tensor. |
|
This operator scales gradients based on the norm of weight and the norm of gradient at different levels using different learning rates. It is used to improve the training precision in large batch size scenarios and is used for large-scale cluster training to reduce the training time. |
||
Excludes the GE initialization time in the training time statistics. Generally, this API is not required for training. Before using the collective communication API, call this API to initialize the collective communication. |
||
Disables all devices. This API is used in conjunction with initialize_system. |
||
Loads an ONNX model as an operator and executes it on the Ascend AI Processor through the TensorFlow framework. |
||
npu_rnn |
Creates a high-performance neural network specified by RNNCell. |
|
npu_dynamic_rnn |
Used for RNN training and inference with TensorFlow. |
|
Used for RNN training and inference with TensorFlow. |
||
npu_scope |
Configures operators built on the host in mixed computing scenarios. |
|
Specifies the operators that preserve the original precision. If the operator precision in an original network model is not supported by the Ascend AI Processor, the system automatically uses the high precision supported by the operators for compute. |
||
Identifies the operators whose weight data will be prefetched into a buffer pool and specifies the ID and size of the buffer pool. |
||
Specifies the scope of the operator for which subgraph-wide dynamic shape profiles are to be applied in the online inference scenario. |
||
util |
Sets the number of iterations per training loop in sess.run mode, that is, the number of training iterations executed on the device side in each sess.run() call. This API can save unnecessary interactions between the host and device and reduce the training time consumption. |
|
This API is used in conjunction with load_iteration_per_loop_var to set the number of iterations per training loop every sess.run() call on the device side. This API is used to modify a graph and set the number of iterations per loop using load_iteration_per_loop_var. |
||
This API is used in conjunction with create_iteration_per_loop_var to set the number of iterations per training loop every sess.run() call on the device side. |
||
Sets the compilation and execution options for a computational graph. After this API is called, configured attributes are added to the fetch node. |
||
Specifies the operators that preserve the original precision. |
||
Applies to subgraph-wide dynamic shape profiles. Specifies the input shape of the operator and shape profiles. |
||
keras_to_npu |
Converts the model constructed by using Keras to an NPUEstimator object. |
|
npu_plugin |
Sets the process-level overflow/underflow mode for floating-point computation. |
|
scoped_graph_manager |
Unloads the variable initialization graph in one go and releases the memory held by constant nodes in the graph. |
|
profiler |
Constructs an object of the Profiler class, which is used to enable the profiling function locally. For example, you can collect the profile data of a local subgraph on the TensorFlow network or a specified step. |
|
hccl.hccl_ops |
Performs the reduction operation on the input data of all ranks in a group and sends the result to the output buffer of all ranks. The reduction operation type is specified by the reduction parameter. This API operates the collective communication operator AllReduce. |
|
Re-sorts the inputs of all ranks in the communicator by rank ID, combines the inputs, and sends the results to the outputs of all ranks. |
||
Broadcasts the data of the root rank to other ranks in the communicator. |
||
Performs the sum operation (or other reduction operations) on the inputs of all ranks, and then distributes the result evenly to the output buffers of ranks according to the rank IDs. Each process receives 1/rank_size portion of data from other processes for reduction. |
||
Performs the sum operation (or other reduction operations) on the data of all ranks and sends the result to the specified position on the root rank. |
||
Sends data to a rank within a collective communication group. |
||
Receives data from a rank within a collective communication group. |
||
Sends data (with the customized data size) to all ranks in the collective communicator and receives data from all ranks. |
||
Sends data (with the customized data size) to all ranks in the collective communicator and receives data from all ranks. alltoallvc passes the RX and TX parameters of all ranks through the argument send_count_matrix, which outperforms alltoallv. |