initialize_system
Description
Excludes the GE initialization time in the training time statistics. Generally, this API is not required for training. Before using the collective communication API, call this API to initialize the collective communication.
Prototype
def initialize_system(name = None)
Options
Option |
Input/Output |
Description |
|---|---|---|
name |
Input |
Operator name |
Returns
An operator for the user to initialize GE by using sess.run(op).
Restrictions
If the initialize_system API needs to be called and the following functions need to be enabled during training, the configuration must be performed when a session is started in initialize_system.
Configuration Option |
Description |
|---|---|
profiling_mode |
Profiling enable.
|
profiling_options |
Option (or options separated by colons) to be traced in profiling.
You can collect multiple items, which must be separated by colons (:), for example, training_trace:task_trace. NOTE:
|
fp_point |
Required if training_trace is selected. Start point of the forward propagated operator in iteration traces, to record the start timestamp of forward propagation. Set the value to the name of the top operator in forward propagation. You can save the graph as a .pbtxt file by using tf.io.write_graph in the training script to obtain this name. |
bp_point |
Required if training_trace is selected. End point of the backward propagated operator in iteration traces, to record the end timestamp of backward propagation. BP_POINT and FP_POINT are used to compute the time used by forward and backward propagation. Set the value to the name of the bottom operator in backward propagation. You can save the graph as a .pbtxt file by using tf.io.write_graph in the training script to obtain this name. |
enable_dump |
Data dump enable.
|
dump_path |
Dump path. Required when enable_dump or enable_dump_debug is set to True. Create the specified path in advance in the environment (either in a container or on the host) where training is performed. The running user configured during installation must have the read and write permissions on this path. The path can be an absolute path or a relative path relative to the path where the training script is executed.
|
dump_step |
Iterations to dump. Defaults to None, indicating that all iterations are dumped. Separate multiple iterations using vertical bars (|), for example, 0|5|10. You can also use hyphens (-) to specify the iteration range, for example, 0|3-5|10. |
dump_mode |
Dump mode. The values are as follows:
|
enable_dump_debug |
Overflow/Underflow detection enable.
|
dump_debug_mode |
Overflow/Underflow detection mode.
|
precision_mode |
A string for the operator precision mode.
For the |
graph_run_mode |
Graph run mode.
|
op_debug_level |
Operator debug enable.
|
enable_exception_dump |
Dump enable for input and output of abnormal operators. The dump information is generated in the current script execution directory.
|
op_select_implmode |
Operator implementation mode select. Some operators built in the Ascend AI Processor can be implemented in either high-precision or high-performance mode.
|
optypelist_for_implmode |
List of operator types. The operators in the list use the mode specified by OP_SELECT_IMPL_MODE. Currently, only Pooling operator is supported. This option is used in pair with OP_SELECT_IMPL_MODE, for example: Set op_select_implmode to high_precision. Set optypelist_for_implmode to Pooling. |
Example
If you use an HCCL API such as get_local_rank_id, get_rank_size, or get_rank_id before sess.run() or estimator.train(), you need to start another session and execute initialize_system to initialize collective communication. After the training is complete, execute shutdown_system and close the session.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 | import tensorflow as tf from npu_bridge.npu_init import * npu_int = npu_ops.initialize_system() npu_shutdown = npu_ops.shutdown_system() config = tf.ConfigProto() custom_op = config.graph_options.rewrite_options.custom_optimizers.add() custom_op.name = "NpuOptimizer" custom_op.parameter_map["use_off_line"].b = True config.graph_options.rewrite_options.remapping = RewriterConfig.OFF config.graph_options.rewrite_options.memory_optimization = RewriterConfig.OFF init_sess = tf.Session(config=config) init_sess.run(npu_int) # Call an HCCL API... # Perform training... init_sess.run(npu_shutdown) init_sess.close() |
Or:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 | import tensorflow as tf from npu_bridge.npu_init import * npu_init = npu_ops.initialize_system() npu_shutdown = npu_ops.shutdown_system() config = tf.ConfigProto() custom_op = config.graph_options.rewrite_options.custom_optimizers.add() custom_op.name = "NpuOptimizer" custom_op.parameter_map["use_off_line"].b = True config.graph_options.rewrite_options.remapping = RewriterConfig.OFF config.graph_options.rewrite_options.memory_optimization = RewriterConfig.OFF with tf.Session(config=config) as sess: sess.run(npu_init) # Call an HCCL API... # Perform training... sess.run(npu_shutdown) |