Command-Line Options
This section describes the configuration options passed to GEInitialize, the Session constructor, and AddGraph, which take effect globally, in a session and in a graph respectively.
Table 1 lists only the configuration options supported by the current version. If an option is not listed in the table, it is reserved or applicable to other Ascend AI Processor versions.
|
Key |
Value |
Required |
Global/Session/Graph |
|---|---|---|---|
|
ge.graphRunMode |
Graph run mode.
|
Optional |
Global/Session |
|
ge.exec.deviceId |
Logical ID of the operated device when the GE instance is running.
N indicates the number of available Ascend AI Processors on the server. |
Optional |
Global |
|
ge.socVersion |
Target model of the Ascend AI Processor for model build and optimization.
|
No |
all |
|
ge.inputShape |
Shape of model input. Arguments:
Configuration example:
NOTE:
In those scenarios, ge.inputShape is optional. If this option is not set, the shape of the corresponding data nodes is used by default. Otherwise, the passed argument is used and updated to those of the corresponding data nodes. |
No |
Session/Graph |
|
ge.dynamicDims |
Dynamic dimension profile in ND format. Applies to the scenario where any dimension is processed each time during inference. This option must be used in pair with ge.inputShape. Argument: formatted as "dim1,dim2,dim3;dim4,dim5,dim6;dim7,dim8,dim9" Format: Enclose the whole argument in double quotation marks (""), and separate the dimension sizes by a semicolon (;). The dimension size values match the -1 placeholders in ge.inputShape with ordering preserved, and the number of -1 placeholders equals the number of dimension sizes of each profile. Set at least two dynamic dimension size profiles. Restrictions: The value range is (1, 100]. You are advised to set 3 or 4 dimensions. Examples:
|
No |
Session/Graph |
|
ge.dynamicNodeType |
Sets the type of a dynamic input node.
Only one type of dynamic inputs is allowed, dataset or placeholder. |
No |
Session/Graph |
|
ge.exec.precision_mode |
A string for the operator precision mode. This option cannot be used together with ge.exec.precision_mode_v2. You are advised to use ge.exec.precision_mode_v2.
Default: In the In the online inference scenario, the default value is force_fp16. |
No |
All |
|
ge.exec.precision_mode_v2 |
A string for the operator precision mode. This option cannot be used together with ge.exec.precision_mode. You are advised to use ge.exec.precision_mode_v2.
Default: In training scenarios, this option has no default value for the In online inference scenarios, the default value is "fp16". |
No |
All |
|
ge.exec.modify_mixlist |
When mixed precision is enabled, you can use this parameter to specify the path and file name of the blocklist, trustlist, and graylist, and specify the operators that allow precision reduction and those that do not allow precision reduction. Set this parameter to the path and file name. The file is in JSON format. For the blocklist, trustlist, and graylist, you can view the value of flag in the precision_reduce option in the built-in tuning policy file ${INSTALL_DIR}/opp/built-in/op_impl/ai_core/tbe/config/<soc_version>/aic-<soc_version>-ops-info.json.
Example
{"ge.exec.modify_mixlist", "/home/test/ops_info.json"};
You can specify the operator type (or types separated by commas) in ops_info.json as follows. {
"black-list": { // Blocklist
"to-remove": [ // Move an operator from the blocklist to the graylist.
"Xlog1py"
],
"to-add": [ // Move an operator from the trustlist or graylist to the blocklist.
"Matmul",
"Cast"
]
},
"white-list": { // Trustlist
"to-remove": [ // Move an operator from the trustlist to the graylist.
"Conv2D"
],
"to-add": [ // Move an operator from the blocklist or graylist to the trustlist.
"Bias"
]
}
}
The operators in the preceding example configuration file are for reference only. The configuration should be based on the actual hardware environment and the built-in tuning policies of the operators. To query the blocklist, trustlist, and graylist: "Conv2D":{
"precision_reduce":{
"flag":"true"
},
true: trustlist; false: blocklist; Not configured: graylist. |
No |
All |
|
ge.exec.profilingMode |
Profiling enable.
|
No |
Global |
|
ge.exec.profilingOptions |
Profiling options.
Example: std::map<ge::AscendString, ge::AscendString> ge_options = {{"ge.exec.deviceId", "0"},
{"ge.graphRunMode", "1"},
{"ge.exec.profilingMode", "1"},
{"ge.exec.profilingOptions", R"({"output":"/tmp/profiling","training_trace":"on","fp_point":"resnet_model/conv2d/Conv2Dresnet_model/batch_normalization/FusedBatchNormV3_Reduce","bp_point":"gradients/AddN_70"})"}}; |
No |
Global |
|
ge.exec.enableDump |
Dump enable.
NOTE:
|
No |
Global/Session |
|
ge.exec.dumpPath |
Dump path. Required when dump and overflow/underflow detection are enabled. Create the specified path in advance in the environment (either container or host) where training is performed. The running user configured during installation must have the read and write permissions on this path. The path can be an absolute path or a path relative to the path where the training script is executed.
The dump data file is generated in the path specified by dump_path, that is, the {dump_path}/{time}/{deviceid}/{model_name}/{model_id}/{data_index} directory. For example, if dump_path is set to /home/HwHiAiUser/output, the dump data file is stored in the /home/HwHiAiUser/output/20200808163566/0/ge_default_20200808163719_121/11/0 path. |
No |
Global/Session |
|
ge.exec.dumpStep |
Iterations to dump. Defaults to None, indicating that all iterations are dumped. Separate multiple iterations using vertical bars (|), for example, 0|5|10. You can also use hyphens (-) to specify the iteration range, for example, 0|3-5|10. |
No |
Global/Session |
|
ge.exec.dumpMode |
Dump mode. The values are as follows:
Configuration example: {"ge.exec.dumpMode", "input"};
Restrictions: If this parameter is set to all, the input data of some operators, such as collective communication operators HcomAllGather and HcomAllReduce, will be modified during execution. Therefore, the system dumps the operator input before operator execution and dumps the operator output after operator execution. In this way, the dumped input and output data of the same operator is flushed to drives separately, and multiple dump files are generated. After parsing the dump files, you can determine whether the data is an input or output based on the file content. |
No |
Global/Session |
|
ge.exec.dumpData |
Type of operator content to dump.
|
No |
Global/Session |
|
ge.exec.dumpLayer |
Operator to be dumped, an operator name. Multiple operator names are separated by spaces. If the input of the specified operator involves the data operator, the data operator information is also dumped.
Configuration example:
{"ge.exec.dumpLayer", "layer1 layer2 layer3"};
|
No |
Global/Session |
|
ge.exec.enableDumpDebug |
Overflow/Underflow detection enable.
NOTE:
|
No |
Global/Session |
|
ge.exec.dumpDebugMode |
Overflow/Underflow detection mode.
|
No |
Global/Session |
|
ge.exec.enable_exception_dump |
Whether to dump data of the exception operator.
NOTE:
If the environment variable NPU_COLLECT_PATH is configured, only L1 exception dump information, including the input and output data of the exception operator, is collected regardless of the value of option enable_exception_dump, and the dump data is stored in the path specified by NPU_COLLECT_PATH. For details about the environment variable, see Environment Variables.
Configuration example:
std::map<ge::AscendString, ge::AscendString> ge_options = {{"ge.exec.enable_exception_dump", "0"},
|
Optional |
Global |
|
ge.exec.disableReuseMemory |
Memory reuse enable.
|
No |
All |
|
ge.graphMemoryMaxSize |
Do not use this option because it will be deprecated in later versions. Network static memory size and maximum dynamic memory size. Varies according to the network size. The unit is byte and the value range is [0, 256 x 1024 x 1024 x 1024] or [0, 274877906944]. The SoC hardware requires that the sum of graph_memory_max_size and variable_memory_max_size be within 31 GB. Defaults to 26 (GB). |
No |
All |
|
ge.variableMemoryMaxSize |
Do not use this option because it will be deprecated in later versions. Variable memory size. Varies according to the network size. The unit is byte and the value range is [0, 256 x 1024 x 1024 x 1024] or [0, 274877906944]. The SoC hardware requires that the sum of graph_memory_max_size and variable_memory_max_size be within 31 GB. Defaults to 5 (GB). |
No |
All |
|
ge.exec.variable_acc |
Variable format optimization enable.
To improve training efficiency, the format of the variables is converted to a format more compatible with the Ascend AI Processor during variable initialization performed by the network. However, this function should be disabled in special scenarios. |
No |
All |
|
ge.exec.rankTableFile |
Information about the cluster participating in collective communication, including the organization information about the server, device, and container. Set this option to the ranktable file path, including the file name. |
No |
All |
|
ge.exec.rankId |
Rank ID, the ID of a process in a group. The value ranges from 0 to (rank size – 1). For a custom group, the rank starts from 0 in the group. For an HCCL world group, the rank ID is the same as the world rank ID.
|
No |
All |
|
ge.opDebugLevel |
Operator debug enable.
NOTICE:
|
No |
All |
|
op_debug_config |
Enable for global memory check. The value is the path of the .cfg configuration file. Multiple options in the configuration file are separated by commas (,).
Configuration example: {"op_debug_config", "/root/test0.cfg"};
The information about the test0.cfg file is as follows: op_debug_config = ccec_g,oom Restrictions: During operator compilation, if you want to compile only some instead of all AI Core operators, you need to add the op_debug_list field to the test0.cfg configuration file. By doing so, only the operators specified in the list are compiled, based on the options configured in op_debug_config. The op_debug_list field has the following requirements:
Configuration example: Add the following information to the configuration file (for example, test0.cfg) specified by op_debug_config: op_debug_config= ccec_g,oom op_debug_list=GatherV2,opType::ReduceSum During model compilation, the GatherV2,ReduceSum operator is compiled based on the ccec_g and oom options.
NOTE:
|
No |
Global |
|
ge.op_compiler_cache_mode |
Disk cache mode for operator build. Arguments:
Default: enable Restrictions:
|
No |
All |
|
ge.op_compiler_cache_dir |
Disk cache directory for operator build. Format: The directory can contain letters, digits, underscores (_), hyphens (-), and periods (.). Defaults to $HOME/atc_data.
|
No |
All |
|
ge.debugDir |
Directory of the debug-related process files generated during operator build, including the .o (operator binary file), .json (operator description file), and .cce files. Defaults to the training script execution directory. Restrictions:
|
No |
All |
|
ge.bufferOptimize |
Buffer optimization enable. Arguments:
Suggestions: You are advised to enable buffer optimization as this function can improve compute efficiency and performance. However, it is possible that your model contains an operator that is not yet covered by the current implementation. If the inference accuracy degradation is eliminated after the buffer optimization function is disabled, locate the fishy operator and submit it to Huawei technical support, who will add buffer optimization support to your operator as soon as possible. Configuration example: {"ge.bufferOptimize", "l2_optimize"}; |
Optional |
Session/Graph |
|
ge.mdl_bank_path |
Sets the directory of the custom repository generated after subgraph tuning. This option must be used in pair with ge.bufferOptimize and takes effect only when buffer optimization is enabled, to improve performance by temporarily storing data in the buffer. Argument: path of the custom repository after model tuning. Format: The value can contain letters, digits, underscores (_), hyphens (-), and periods (.). Default: $HOME/Ascend/latest/data/aoe/custom/graph/<soc_version> Restrictions: Priority ranked from high to low: the directory specified by ge.mdl_bank_path > the directory specified by TUNE_BANK_PATH > the default directory.
|
No |
All |
|
ge.op_bank_path |
Path of the custom repository generated after operator tuning. Format: The path can contain letters, digits, underscores (_), hyphens (-), and periods (.). Default: ${HOME}/Ascend/latest/data/aoe/custom/op Restrictions: Priority ranked from high to low: the directory specified by TUNE_BANK_PATH > the directory specified by OP_BANK_PATH > the default directory of the custom repository generated after operator tuning.
|
No |
All |
|
ge.exec.dynamicGraphExecuteMode |
This option is deprecated. Avoid using it. Execution mode, applicable to the dynamic input scenario. The value is dynamic_execute. |
No |
Graph |
|
ge.exec.dataInputsShapeRange |
This option is deprecated. Avoid using it. Shape range of dynamic input. If a graph has two data inputs, the configuration example is as follows. std::map<ge::AscendString, ge::AscendString> ge_options = {{"ge.exec.deviceId", "0"},
{"ge.graphRunMode", "1"},
{"ge.exec.dynamicGraphExecuteMode", "dynamic_execute"},
{"ge.exec.dataInputsShapeRange", "[128 ,3~5, 2~128, -1],[ 128 ,3~5, 2~128, -1]"}};
NOTE:
|
No |
Graph |
|
ge.exec.op_precision_mode |
Precision mode of one or more specified operators during internal processing. This option is used to transfer the customized precision mode configuration file op_precision.ini to set different precision modes for different operators. Set the precision mode based on the operator type (low priority) or node name (high priority) in each row in the .ini file. The following precision modes can be set in the configuration file:
You can view the precision or performance mode supported by an operator in the opp/built-in/op_impl/ai_core/tbe/impl_mode/all_ops_impl_mode.ini file in the file storage path with the CANN software installed. Example: [ByOpType] optype1=high_precision optype2=high_performance optype4=support_out_of_bound_index [ByNodeName] nodename1=high_precision nodename2=high_performance nodename4=support_out_of_bound_index |
No |
Global |
|
ge.opSelectImplmode |
The function of this parameter does not evolve and will be deprecated in later versions. You are advised to use ge.exec.op_precision_mode. Operator implementation mode select. Certain operators built in the Ascend AI Processor can be implemented in either high-precision or high-performance mode at model build time. In high-precision mode, Taylor's theorem or Newton's method is used to improve operator accuracy with float16 input. In high-performance mode, the optimal performance is implemented without affecting the network precision (float16). Arguments:
The preceding implementation modes are distinguished based on the dtype of the operator. Replace ${INSTALL_DIR} with the actual CANN component directory. If the Ascend-CANN-Toolkit package is installed as the root user, the CANN component directory is /usr/local/Ascend/ascend-toolkit/latest. Default: high_performance |
No |
Global |
|
ge.optypelistForImplmode |
List of operator types. The operators in the list use the mode specified by the ge.opSelectImplmode option. Restrictions:
|
No |
Global |
|
ge.shape_generalized_build_mode |
Do not use this option because it will be deprecated in later versions. |
No |
Graph |
|
ge.customizeDtypes |
Customized operator precision during model build. Other operators in the model are built according to ge.exec.precision_mode or ge.exec.precision_mode_v2. Set it to the path (including the name of the configuration file), for example, /home/test/customize_dtypes.cfg. Restrictions:
The structure of the configuration file is as follows: # By operator name Opname1::InputDtype:dtype1,dtype2,…OutputDtype:dtype1,… Opname2::InputDtype:dtype1,dtype2,…OutputDtype:dtype1,… # By operator type OpType::TypeName1:InputDtype:dtype1,dtype2,…OutputDtype:dtype1,… OpType::TypeName2:InputDtype:dtype1,dtype2,…OutputDtype:dtype1,… Example: # By operator name resnet_v1_50/block1/unit_3/bottleneck_v1/Relu::InputDtype:float16,int8,OutputDtype:float16,int8 # By operator type OpType::Relu:InputDtype:float16,int8,OutputDtype:float16,int8
NOTE:
|
No |
Session |
|
ge.exec.atomicCleanPolicy |
Collectively cleans up the memory occupied by all operators with the memset attribute (memset operators) on the network. Arguments:
|
No |
Session |
|
ge.jit_compile |
Not supported in the current version. |
No |
Global/Session |
|
ge.build_inner_model |
Not supported in the current version. |
No |
N/A |
|
ge.externalWeight |
When multiple models are loaded in a session, if the weights of these models can be reused, you are advised to use this configuration item to externalize the weights of the Const/Constant nodes on the network to implement weight reuse among multiple models and reduce the memory usage of the weights. Arguments:
Description of the file flush path:
When the model is uninstalled, the tmp_weight_<pid>_<sessionid> directory is deleted. Configuration example: {"ge.externalWeight", "1"}; |
No |
Session |
|
stream_sync_timeout |
Timeout for stream synchronization during graph execution. If the timeout exceeds the configured value, a synchronization failure is reported. The unit is ms. The default value is -1, indicating that there is no waiting time and no error is reported when the synchronization fails. |
No |
Global/Session |
|
event_sync_timeout |
Timeout for event synchronization during graph execution. If the timeout exceeds the configured value, a synchronization failure is reported. The unit is ms. The default value is -1, indicating that there is no waiting time and no error is reported when the synchronization fails. |
No |
Global/Session |
|
ge.exec.staticMemoryPolicy |
Memory allocation mode used during network running. Arguments:
NOTE:
Configuration example: {"ge.exec.staticMemoryPolicy", "2"}; |
No |
Global |
|
ge.graph_compiler_cache_dir |
Disk cache directory for graph compilation. This option is used together with ge.graph_key. This function takes effect only when both ge.graph_compiler_cache_dir and ge.graph_key are not empty. The configured cache directory must exist. Otherwise, the compilation fails. After a graph is changed, the original cache file is unavailable. You need to manually delete the cache file from the cache directory or modify ge.graph_key to rebuild and generate a cache file. For details about other restrictions and usage methods, see Graph Build Cache. |
No |
Session |
|
ge.graph_key |
Unique graph ID. The value contains a maximum of 128 characters, including only letters, digits (0–9), underscores (_), and hyphens (-). |
No |
Graph |
|
ge.featureBaseRefreshable |
Whether the feature memory address can be refreshed. To manage the feature memory and refresh the address for multiple times, set this option to the refreshable value. This option applies only to static shape graphs. Arguments: 0 (default): The feature memory address cannot be refreshed. 1: The feature memory address of a model can be refreshed. |
No |
All |
|
ge.constLifecycle |
Lifecycle of constant nodes in the training and online inference scenario. session (default): Constant nodes are stored at the session level. In this case, memory reuse is supported for constant nodes between multiple graphs in a session. However, ensure that constant nodes with the same name in multiple graphs are the same. graph: Constant nodes are stored at the graph level. You can call SetGraphConstMemoryBase to manage the const memory at the graph level. The default value is session in the training scenario and graph in the online inference scenario. |
No |
All |
|
ge.exec.inputReuseMemIndexes |
Memory reuse enable for the input node of a graph. After the function is enabled, the memory of the input node can be reused as the intermediate memory required during model execution, reducing the memory peak. The value is the index of the input node. If memory reuse is enabled for multiple input nodes, use commas (,) to separate multiple indexes. The index attribute of the input node is required, specifying the sequence number of the input. The index starts from 0. Note:
Configuration example: {"ge.exec.inputReuseMemIndexes", "0,1,2"}; |
No |
Graph |
|
ge.exec.outputReuseMemIndexes |
Memory reuse enable for the entire graph output. After the function is enabled, the memory of the entire graph output can be reused as the intermediate memory required during model execution, reducing the memory peak. If enabled, the value is the index of the entire graph output. If memory reuse is enabled for multiple outputs, use commas (,) to separate multiple indexes. Note:
Configuration example: {"ge.exec.outputReuseMemIndexes", "0,1,2"}; |
No |
Graph |
|
ge.disableOptimizations |
This parameter is used for debugging and cannot be used in commercial products. The function specified by this parameter will be released as a feature in later versions. This parameter applies only to the following products: Specifies one or more compilation and optimization passes to be disabled. Currently, only the following passes can be disabled: "RemoveSameConstPass","ConstantFoldingPass","TransOpWithoutReshapeFusionPass" Note:
Configuration examples:
|
No |
all |
|
ac_parallel_enable |
Whether to allow AI CPU operators and AI Core operators to run in parallel in a dynamic-shape graph. In a dynamic-shape graph, when this function is enabled, the system automatically identifies AI CPU operators that can be run in parallel with the AI Core operators in the graph. Operators of different engines are distributed to different streams to run in parallel, improving resource utilization and dynamic shape execution performance. Arguments:
Configuration example: {"ac_parallel_enable", "1"}; |
Optional |
Global |
|
ge.deterministic |
Deterministic computing enable. By default, deterministic computing is disabled. The results of multiple executions of an operator with the same hardware and input may be different. This is generally caused by asynchronous multi-thread executions during operator implementation, which changes the accumulation sequence of floating point numbers. When deterministic computing is enabled, the same output is generated if an operator is executed for multiple times with the same hardware and input. This often slows down operator execution. If the execution results of a model are different for multiple times or the model accuracy needs to be tuned, you can enable deterministic computing to assist model debugging and tuning. Arguments:
Configuration example: {"ge.deterministic", "1"}; |
Optional |
Global |
|
ge.enableGraphParallel |
Algorithm-based partitioning for the original foundation model. The value 1 indicates that algorithm-based partitioning is enabled. For details about the partitioning strategy, see the configuration file specified by ge.graphParallelOptionPath. If this option is set to another value or left empty, algorithm-based partitioning is disabled. By default, this option is left empty. Configuration example: {"ge.enableGraphParallel", "1"}; |
No |
Graph |
|
ge.exec.enableEngineParallel |
Whether to perform tiling on communication operators and related computation operators in the network so that they can run in parallel in the partitioning and deployment scenarios of foundation models. Tiling can be performed only when communication operators exist on the network and this option is set to 1. During tiling, only AllReduce communication operators are partitioned. If this option is set to another value or left empty, algorithm-based partitioning is disabled. By default, this option is left empty. Configuration example: {"ge.exec.enableEngineParallel", "1"}; |
Optional |
Graph |
|
ge.graphParallelOptionPath |
Path and name of the algorithm-based partitioning strategy configuration file when the original foundation model is partitioned. This option takes effect only when ge.enableGraphParallel is set to 1. Configuration example: {"ge.graphParallelOptionPath", "./parallel.json"};
The configuration file must be in JSON format. The following is an example:
Arguments:
|
No |
Graph |
|
ge.exec.hostSchedulingMaxThreshold |
Maximum threshold to enable dynamic shape scheduling when a static small graph (root graph) is executed. The default value is 0. It is recommended that this option be used in foundation model scenarios.
Note: If the static root graph node contains subgraphs, this option does not take effect. |
No |
All |
|
ge.exec.static_model_ops_lower_limit |
Lower limit of the number of operators in a static subgraph. The value ranges from –1 to positive infinity. If other values are used, an error is reported. The default value is 4.
For example, if there are four operators in a static subgraph and this option is set to 10, static subgraphs are not partitioned separately, but are executed through dynamic graphs. |
Optional |
Graph |
|
ge.exec.input_fusion_size |
Threshold for fusing and copying multiple discrete pieces of user input data during data transfer from the host to the device. The minimum value is 0, the maximum value is 32 MB (33,554,432 bytes), and the default value is 128 KB (131,072 bytes).
Assume there are 10 user inputs, including two 100 KB inputs, two 50 KB inputs, and the other inputs greater than 100 KB:
This option takes effect only when the static graph is run asynchronously. That is, RunGraphAsync is used. |
Optional |
all |
|
ge.topoSortingMode |
Traversal mode when you compile operators in graph mode. It is mainly used for online inference scenarios. Arguments:
Configuration example: {"ge.topoSortingMode", "1"}; |
Optional |
all |
|
ge.tiling_schedule_optimize |
Tiling offload scheduling optimization. As internal storage of the AI Core in the NPU cannot store all the input and output data of operators, the input data is tiled into different parts. The first part is transferred in, computed, and then transferred out, so does the next part. This process is called tiling. Then, a computation program, called tiling implementation, determines tiling parameters (such as the block size transferred each time and the total number of cycles) based on operator information such as shape. The AI Core is not good at scalar computation in the tiling implementation. Therefore, tiling implementation is generally executed on the CPU on the host. However, tiling implementation is executed on the device when the following conditions are met:
Arguments:
Configuration example: {"ge.tiling_schedule_optimize", "1"}; |
Optional |
Global/Session |
|
ge.exportCompileStat |
Whether to generate the result file fusion_result.json of operator fusion information (including graph fusion and UB fusion) during graph build. This file is used to record the fusion patterns used during graph build. In the file:
Arguments:
NOTE:
Configuration example: {"ge.exportCompileStat", "1"}; |
Optional |
all |
|
ge.graphMaxParallelModelNum |
In graph execution mode, a graph can be concurrently loaded and executed by multiple models on the same device. This parameter is used to control the maximum number of models that can be concurrently loaded. Arguments: 1 to INT32_MAX. The default value is 8. Configuration example: {"ge.graphMaxParallelModelNum", "10"}; |
Optional |
all |
|
ge.oo.level |
Extended parameter for debugging. It cannot be used in commercial products and will be released as a formal function in later versions. Multi-level optimization options for graph build include subgraph optimization, full graph optimization, and static shape model sinking. Static shape model sinking: In this approach, the input and output shapes of all operators in a static shape model can be determined at build time, allowing for model-level memory orchestration and operator tiling computation to be completed on the host. These computations are then batched and sent to the device stream when the model is loaded, but they are not executed immediately. Instead, the execution of all tasks within the model is triggered by deliver model execution tasks. Arguments:
Configuration example: {"ge.oo.level", "O1"}; |
Optional |
all |
|
ge.oo.constantFolding |
Extended parameter for debugging. It cannot be used in commercial products and will be released as a formal function in later versions. Enables constant folding optimization. Constant folding is the process of replacing nodes in a computation graph that can be evaluated to a constant output value with that constant, and simplifying the structure of the computation graph accordingly. Arguments:
Configuration example: {"ge.oo.constantFolding", "true"};
Restrictions: If other compilation optimization options, such as ge.disableOptimizations, are configured, ge.disableOptimizations has a higher priority. |
Optional |
all |
|
ge.oo.deadCodeElimination |
Extended parameter for debugging. It cannot be used in commercial products and will be released as a formal function in later versions. Enables dead-edge elimination optimization. Dead-edge elimination: When pred (input 1) of a switch statement is a constant node, one of the branches can be eliminated based on the value of const. If const is true, the false branch is eliminated; if const is false, the true branch is eliminated. Arguments:
Configuration example: {"ge.oo.deadCodeElimination", "true"}; |
Optional |
all |
|
ge.exec.modelDeployMode |
Model deployment mode in the partitioning and deployment scenarios of all foundation models.
|
No |
Graph |
|
ge.exec.modelDeployDevicelist |
Device used by the current execution node for model deployment and execution in the partitioning and deployment scenarios of foundation models. This option is used in conjunction with ge.exec.modelDeployMode in the SPMD scenario. |
No |
Graph |
|
ge.exec.frozenInputIndexes |
Index of the input tensor whose address is not refreshed. This parameter can be called only for LoadGraph. The input tensor index varies according to the model.
Configuration example: # Pass only the input tensor index.
{"ge.exec.frozenInputIndexes", "0;1;2"};
# Pass the input tensor index, address of the data on the device, and data length.
{"ge.exec.frozenInputIndexes", "0,88832131,4;1,888213294,4;2,193492421,2"};
Restrictions: The input tensor whose address is not refreshed must have a static shape. For a dynamic shape model, the input tensor must also have a static shape. |
Optional |
Graph |