API List

TF Adapter provides APIs for users to develop training or online inference scripts based on the deep learning framework TensorFlow 1.15.

Figure 1 TF Adapter

API path: ${install_path}/python/site-packages/npu_bridge

**Table 1** TF Adapter APIs
Function	API Name	Description
Session Configuration	Session Configuration Options	TF Adapter provides a series of session configurations for function debugging, performance improvement, and precision improvement. Developers can use these session configurations when performing model training or online inference on the Ascend AI Processor.
npu.npu_config	NPURunConfig Constructor	When performing model training or online inference in Estimator mode on the Ascend AI Processor, you can use the constructor of the NPURunConfig class to specify the running configuration of the Estimator.
	ProfilingConfig Constructor	Configures the Profiling function.
	MemoryConfig Constructor	Configures the system memory usage mode.
	DumpConfig Constructor	Configures the dump function.
	ExperimentalConfig Constructor	Extended parameter for debugging and may be changed in later versions. It cannot be used in commercial products.
npu.npu_estimator	NPUEstimator Constructor	Constructor of the NPUEstimator class. The NPUEstimator class inherits the Estimator class of TensorFlow and can call the native APIs of the base class to train and evaluate TensorFlow models.
npu.npu_estimator	NPUEstimatorSpec Constructor	Constructor of the NPUEstimatorSpec class. The NPUEstimatorSpec class inherits the EstimatorSpec class of the TensorFlow and can call the native APIs of the base class to define specific model objects.
npu_strategy	NPUStrategy Constructor	Constructs an object of class NPUStrategy NPUStrategy inherits the tf.distribute.Strategy class and can call the native APIs of the base class to implement distributed training in the NPU environment.
npu_hook	NPUCheckpointSaverHook Constructor	Constructs an object of class NPUCheckpointSaverHook, which is used to save the checkpoint file. The NPUCheckpointSaverHook class inherits the CheckpointSaverHook class and can call the native APIs of the base class to record model information during training.
	NPUOutputTensorHook Constructor	Constructs an object of class NPUOutputTensorHook. NPUOutputTensorHook is a hook for training, evaluation, and prediction of NPUEstimator, and it can call the user-defined output_fn every N step or at the end to print the output tensors. The NPUOutputTensorHook class inherits the LoggingTensorHook class and can call native APIs of the base class.
	TellMeStepOrLossHook Constructor	Constructs an object of the TellMeStepOrLossHook class. TellMeStepOrLossHook is used to notify the bottom-layer software of the serial number of the current step and the total number of steps or the current loss and the target loss.
npu_optimizer	NPUDistributedOptimizer Constructor	Constructs an object of class NPUDistributedOptimizer, which wraps around a single-server training optimizer to an NPU distributed training optimizer.
	NPUOptimizer Constructor	Constructs an object of class NPUOptimizer, which combines the NPUDistributedOptimizer and NPULossScaleOptimizer optimizers. It provides the following functions: Loss scaling: Loss scaling can be enabled during mixed precision training to solve the underflow problem caused by a small float16 representation range. Distributed training: With an NPU distributed training optimizer wrapped from a single-server training optimizer, calculated gradients can be aggregated in single-server single-device, single-server multi-device, and multi-server multi-device networking modes. By changing a computation dependency relationship, a computation operation that does not depend on the last AR (gradient aggregation fragment) is scheduled to be performed in parallel with the last AR, to optimize communication tailing.
	KerasDistributeOptimizer Constructor	Constructs an object of class KerasDistributeOptimizer, which wraps around the single-server training optimizer constructed by tf.Keras to an NPU distributed training optimizer.
	npu_distributed_optimizer_wrapper	Adds the AllReduce operation of NPU to the input gradient function of the optimizer and combines them into one function and returns the optimizer. This API is used only in distributed scenarios.
	npu_allreduce	Performs AllReduce and update operations on gradients after the gradient computing is complete.
npu_callbacks	NPUBroadcastGlobalVariablesCallback Constructor	Broadcasts variables in Keras scenarios to ensure that the initial values of variables on each device are the same in distributed scenarios.
npu_bridge.estimator.npu.npu_loss_scale_optimizer	NPULossScaleOptimizer Constructor	Constructor of the NPULossScaleOptimizer class, which is used to enable loss scaling in mixed precision training when the overflow/underflow mode of floating-point computation is saturation mode. Loss scaling solves the underflow problem caused by the small float16 representation range.
npu.npu_loss_scale_manager	FixedLossScaleManager Constructor	Constructor of the FixedLossScaleManager class, which is used to define the static LossScale parameter during training when the overflow/underflow mode of floating-point computation is saturation mode.
npu.npu_loss_scale_manager	ExponentialUpdateLossScaleManager Constructor	Constructor of the ExponentialUpdateLossScaleManager class, which is used to define the dynamic LossScale parameter during training and dynamically obtains and updates the value of LossScale by defining the loss_scale variable when the overflow/underflow mode of floating-point computation is saturation mode.
npu_ops	dropout	(Functionally the same as tf.nn.dropout.) Scales the elements of an input tensor by 1/keep_prob where keep_prob indicates the preservation probability of the input tensor and outputs a tensor of the same shape of the input tensor. Elements not rescaled are set to 0.
	LARSV2	Scales gradients based on the norm of weight and the norm of gradient at different levels using different learning rates. It is used to improve the training accuracy in large batch size scenarios and is used for large-scale cluster training to reduce the training time.
	initialize_system	Excludes the GE initialization time in the training time statistics. Generally, this API is not required for training. Before using the collective communication API, call this API to initialize the collective communication.
	shutdown_system	Shuts down all devices. This API is used in conjunction with initialize_system.
	npu_onnx_graph_op	Loads an ONNX model as an operator and executes it on the Ascend AI Processor through the TensorFlow framework.
npu_rnn	npu_dynamic_rnn	Creates a high-performance neural network specified by RNNCell.
npu_dynamic_rnn	DynamicRNN Constructor	Used for RNN training and inference with TensorFlow.
npu_dynamic_rnn	DynamicGRUV2 Constructor	Used for RNN training and inference with TensorFlow.
npu_scope	without_npu_compile_scope	Configures operators built on the host in mixed computing scenarios.
	keep_dtype_scope	Specifies the operators that preserve the original precision. If the operator precision in an original network model is not supported by the Ascend AI Processor, the system automatically uses the high precision supported by the operators for compute.
	npu_weight_prefetch_scope	Identifies the operators whose weight data will be prefetched into a buffer pool and specifies the ID and size of the buffer pool.
	subgraph_multi_dims_scope	Specifies the scope of the operator for which subgraph-wide dynamic shape profiles are to be applied in the online inference scenario.
util	set_iteration_per_loop	Sets the number of iterations per training loop in sess.run mode, that is, the number of training iterations executed on the device side in each sess.run() call. This API can save unnecessary interactions between the host and device and reduce the training time consumption.
	create_iteration_per_loop_var	This API is used in conjunction with load_iteration_per_loop_var to set the number of iterations per training loop every sess.run() call on the device side. This API is used to modify a graph and set the number of iterations per loop using load_iteration_per_loop_var.
	load_iteration_per_loop_var	This API is used in conjunction with create_iteration_per_loop_var to set the number of iterations per training loop every sess.run() call on the device side.
	set_graph_exec_config	Sets the compilation and execution options for a computational graph. After this API is called, configured attributes are added to the fetch node.
	keep_tensors_dtypes	Specifies the operators that preserve the original precision.
	set_op_input_tensor_multi_dims	Applies to subgraph-wide dynamic dimension size and specifies the input shape of an operator and dimension size profiles in the online inference scenario.
keras_to_npu	model_to_npu_estimator	Converts the model constructed by using Keras to an NPUEstimator object.
npu_plugin	set_device_sat_mode	Sets the process-level overflow mode for floating-point computation.
profiler	Profiler Constructor	Constructor of the Profiler class, which is used to enable the profiling function locally. For example, you can collect the profile data of a local subgraph on the TensorFlow network or a specified step.