aclInit
Applicability
|
Product |
Supported |
|---|---|
|
|
√ |
|
|
√ |
|
|
√ |
|
|
√ |
|
|
√ |
Description
Performs initialization.
During initialization, you can use the configuration file to enable or set the following functions:
- Dump configuration, including the following configurations (if the operator input or output contains sensitive user information, there may be risks of information leakage):
- Model dump configuration (used to export the input and output data of operators at each layer in the model) and single-operator dump configuration (used to export the input and output data of a single operator). The exported data is used to compare with that of a specified model or operator to locate accuracy issues. For details about configuration examples, see Examples of Model Dump Configuration and Single-Operator Dump Configuration. This dump configuration is disabled by default.
To enable the dump configuration through this API, you need to use the dump_path parameter to configure the path for storing dump data.
- Dump configuration for exception operators (used to export the input and output data, workspace information, tiling information, and other information of exception operators). The exported data is used to analyze AI Core errors. For details about configuration examples, see Example of Dump Configuration for Exception Operators. By default, this dump configuration is disabled. For details about how to collect and locate AI Core errors, see Typical Faults > AI Core Error Locating in Troubleshooting.
- Overflow/Underflow operator dump configuration (used to export the input and output data of overflow/underflow operators in the model). The exported data is used to analyze overflow/underflow causes and locate model accuracy issues. For details about configuration examples, see Example of Overflow/Underflow Operator Dump Configuration. This dump configuration is disabled by default.
- Dump watch configuration for operators (used to enable the watch mode for the output data of specified operators). If you suspect that the memory is overwritten by other operators after locating the accuracy issues of some operators and excluding the calculation issues of these operators, you can enable the dump watch mode. For details about configuration examples, see Dump Watch Configuration for Operators. The dump watch mode is disabled by default.
- Model dump configuration (used to export the input and output data of operators at each layer in the model) and single-operator dump configuration (used to export the input and output data of a single operator). The exported data is used to compare with that of a specified model or operator to locate accuracy issues. For details about configuration examples, see Examples of Model Dump Configuration and Single-Operator Dump Configuration. This dump configuration is disabled by default.
- Profiling configuration. For configuration examples, see "Using the acl.json Configuration File for Data Profiling" in Profiling Instructions. Profiling configuration is disabled by default.
Dump configuration and profiling configuration are mutually exclusive. The dump operation could affect system performance, resulting in inaccurate profile data collected by Profiling.
- Operator cache aging configuration. To reduce the memory footprint and balance the call performance when you execute a single operator in a single-operator model (the API excluded), you can use the max_opqueue_num parameter to configure the maximum length of the "operator type and single-operator model" mapping queue. If the length of the mapping queue reaches the maximum, the least used mapping information and single-operator models in the cache are deleted before the most recent mapping information and the corresponding single-operator model are loaded. The default maximum length of the mapping queue is 20000. For details about configuration examples, see Example of Operator Cache Aging Configuration.
In single-operator model execution, operator execution is based on graph IR. First, the operator is compiled (for example, the ATC tool is used to compile the single-operator description file defined by Ascend IR into an operator .om model file). Then, an acl API (for example, ) is called to load the operator model. Finally, an acl API (for example, ) is called to execute the operator.
- Error information report mode configuration, which is used to control the aclGetRecentErrMsg API to obtain error information by process or thread level. By default, error information is obtained by thread level. For details about configuration examples, see Example of Error Information Report Mode Configuration.
- Default device configuration (used to configure the default compute device). For details about configuration examples, see Example of Default Device Configuration.
If the device is specified through aclrtSetDevice, aclrtSetDevice has a high priority.
If you want to explicitly create a context after enabling the default device configuration, you need to call aclrtSetDevice. Otherwise, service exceptions may occur.
- AI Core stack size configuration, which is used to control the size of the stack space allocated to each AI Core during kernel execution in a process. The default value is 32 KB. For details about configuration examples, see Example of AI Core Stack Size Configuration. When you are compiling AI Core operators, the AI Core stack size configured here is valid only when O0 is enabled.
This configuration is only available for the following models:
Atlas A3 training products /Atlas A3 inference products Atlas A2 training products /Atlas A2 inference products Atlas 200I/500 A2 inference products - Event resource scheduling mode configuration, which is used to control how event resources are scheduled when model running instances are built in capture mode. For details about configuration examples, see Example of Configuring the Event Resource Scheduling Mode.
This configuration is only available for the following models:
Atlas A3 training products /Atlas A3 inference products Atlas A2 training products /Atlas A2 inference products
Prototype
aclError aclInit(const char *configPath)
Parameters
|
Parameter |
Input/Output |
Description |
|---|---|---|
|
configPath |
Input |
Pointer to the path (including the file name) of the configuration file. The configuration file is in JSON format. A JSON file allows up to 10 levels of curly brackets and square brackets, respectively. To use the default configurations, pass NULL or an empty JSON configuration file with only a pair of curly brackets {} to the aclInit call. |
Returns
0 on success; else, failure. For details, see aclError.
Restrictions
- aclInit must be called before app development using acl APIs. Otherwise, errors may occur during the initialization of internal resources, causing service exceptions.
- aclInit can be called multiple times in a process, but the aclFinalize or aclFinalizeReference API must be called for deinitialization.
- The configuration must be consistent each time aclInit is called. Otherwise, only the configuration of the first call is valid. Calling the aclInit API again may cause errors.
- To be compatible with earlier versions, if the aclInit API is called repeatedly, the error code ACL_ERROR_REPEAT_INITIALIZE will be returned. You can ignore this error and continue to process services.
- The aclInit and aclFinalize APIs can be called repeatedly for initialization and deinitialization, respectively. Only sequential calls are available for the two APIs.
aclInit --> service processing --> aclFinalize --> aclInit --> service processing --> aclFinalize
After you call aclInit multiple times, you only need to call aclFinalize once to perform deinitialization. The aclInit reference count will be reset to 0.
- If the aclInit and aclFinalizeReference APIs are called for initialization and deinitialization, respectively, the two APIs need to be called in pairs.
aclFinalizeReference involves reference counting. Each time aclInit is called, the reference count increases by 1. Each time aclFinalizeReference is called, the reference count decreases by 1. Deinitialization is performed only when the reference count decreases to 0.
Initialization and deinitialization can be performed repeatedly. Both sequential and concurrent calls are available for the two APIs.
Examples of Model Dump Configuration and Single-Operator Dump Configuration
After model dump or single-operator dump is configured, the exported data is used to compare with that of a specified model or operator to locate accuracy issues. For details about the comparison method, see Accuracy Debugging Tool Guide.
Example of model dump configuration:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 |
{ "dump":{ "dump_list":[ { "model_name":"ResNet-101" }, { "model_name":"ResNet-50", "layer":[ "conv1conv1_relu", "res2a_branch2ares2a_branch2a_relu", "res2a_branch1", "pool1" ] } ], "dump_path":"$HOME/output", "dump_mode":"output", "dump_op_switch":"off", "dump_data":"tensor" } } |
Example of single-operator dump configuration:
1 2 3 4 5 6 7 8 |
{ "dump":{ "dump_path":"output", "dump_list":[], "dump_op_switch":"on", "dump_data":"tensor" } } |
Example of Dump Configuration for Exception Operators
You can enable dump for exception operators by setting dump_scene. The following is an example of the configuration file, indicating that lightweight exception dump is enabled:
{
"dump":{
"dump_path":"output",
"dump_scene":"aic_err_brief_dump"
}
}
The details are as follows:
- dump_scene can be set to:
- aic_err_brief_dump: lightweight exception dump, which is used to export the input, output, and workspace data of exception operators of AI Core.
- aic_err_norm_dump: common exception dump, which is used to export the shape, data type, format, and attribute information in addition to the lightweight exception dump.
- aic_err_detail_dump: exports the internal storage, register, and call stack information of AI Core in addition to the lightweight exception dump.
When configuring this parameter, note that:
- This parameter is only available for the following models and requires the driver of 25.0.RC1 or later:
Atlas A3 training products /Atlas A3 inference products Atlas A2 training products /Atlas A2 inference products You can click here to download the driver installation package of Ascend HDK 25.0.RC1 or later on the Firmware and Drivers page and install or upgrade the driver by referring to the document of the corresponding version.
- During dump file export, the AI Core where an exception operator is located is suspended, which may affect the execution of other processes on the device. After dump files are exported, the AI Core is automatically restored. Therefore, you are not advised to use aic_err_detail_dump when multiple host-side user service processes share the same device.
- After dump files are exported, host-side user service processes are forcibly exited. Errors reported during the forcible exit are not used as the input for AI Core problem analysis.
- If aic_err_detail_dump is configured and dump files are generated but not *.core files, aic_err_detail_dump is not configured successfully. In this case, aic_err_brief_dump will be used instead.
- This parameter is only available for the following models and requires the driver of 25.0.RC1 or later:
- lite_exception: light exception dump. It is provided to be compatible with earlier versions and is equivalent to aic_err_brief_dump.
- dump_path is an optional parameter, indicating the path for storing exported dump files.
The priority of the dump file storage path is as follows: NPU_COLLECT_PATH environment variable > ASCEND_WORK_PATH environment variable > dump_path in the configuration file > current execution directory of the application.
For details about environment variables, see Environment Variables.
- To view the content of an exported dump file, convert the dump file to a NumPy file and then view the NumPy file using Python. For details about the conversion procedure, see "Viewing Dump Files" in Accuracy Debugging Tool Guide.
If dump_scene is set to aic_err_detail_dump, you can use msDebug to view the content of an exported dump file. For details, see Operator Development Tool User Guide.
- The dump configuration for exception operators cannot be enabled if the model dump configuration or single-operator dump configuration is enabled.
Example of Overflow/Underflow Operator Dump Configuration
{
"dump":{
"dump_path":"output",
"dump_debug":"on"
}
}
- If dump_debug is not set or set to off, the overflow/underflow operator configuration is disabled.
- If the overflow/underflow operator configuration is enabled, dump_path must be set to specify the path for storing exported dump files.
After obtaining the exported data files, parse the files by referring to "Overflow/Underflow Operator Data Collection and Analysis" in Accuracy Debugging Tool Guide.
dump_path can be either absolute or relative.- An absolute path starts with a slash (/), for example, /home.
- A relative path starts with a directory name, for example, output.
- The overflow/underflow operator configuration cannot be enabled if the model dump configuration or single-operator dump configuration is enabled. Otherwise, an error is returned.
- Only overflow/underflow data of AI Core operators can be collected.
Dump Watch Configuration for Operators
Set dump_scene to watcher to enable dump watch for operators. Below is an example of the content in the configuration file. The configuration effect is as follows: (1) After operators A and B are executed, the output of operators C and D is dumped; (2) After operators C and D are executed, the output of operators C and D is also dumped. The dump files of operators C and D in (1) will be compared with those in (2) to check whether operator A or B overwrites the output memory of operator C or D.
{
"dump":{
"dump_list":[
{
"layer":["A", "B"],
"watcher_nodes":["C", "D"]
}
],
"dump_path":"/home/",
"dump_mode":"output",
"dump_scene":"watcher"
}
}
The details are as follows:
- If dump watch is enabled for operators, the overflow/underflow operator dump (by setting dump_debug) and single-operator model dump (by setting dump_op_switch) cannot be enabled. Otherwise, an error will be reported. Dump watch cannot be applied in the single-operator API dump scenario.
- In dump_list, layer is the operators that may overwrite the memory of other operators, and watcher_nodes is the operators whose output memory may be overwritten by other operators. If the output of an operator is overwritten, the operator accuracy may decrease.
- If layer is not specified, the output of operators configured with watcher_nodes will be dumped after all operators that support dump in the model are executed.
- If any operator in layer and watcher_nodes is not in a static graph or static subgraph, the configuration does not take effect.
- If an operator is in both layer and watcher_nodes or an operator in layer is a collective communication operator (the operator type starts with Hcom, for example, HcomAllReduce), only the dump files of operators in watcher_nodes will be exported.
- For a fused operator, use its name after fusion when you add it to watcher_nodes. Otherwise, dump files cannot be exported.
- Currently, model_name cannot be configured in dump_list.
- If the dump watch configuration is enabled for operators, dump_path must be set to specify the path for storing exported dump files.
The exported dump files cannot be viewed using a text tool. To view the content of a dump file, convert the dump file to a NumPy file and then view the NumPy file using Python. For details about the conversion procedure, see "Viewing Dump Files" in Accuracy Debugging Tool Guide.
dump_path can be either absolute or relative.- An absolute path starts with a slash (/), for example, /home.
- A relative path starts with a directory name, for example, output.
- dump_mode is used to control what data of operators in watcher_nodes can be exported. Currently, the value can only be output.
Example of Operator Cache Aging Configuration
You can use max_opqueue_num to set the maximum length of the "operator type and single-operator model" mapping queue to age operator cache information. The following is an example of the configuration file:
{
"max_opqueue_num": "10000"
}
The details are as follows:
- For statically loaded operators (a single operator is compiled to generate an *.om file and then loaded using an API such as ), the aging configuration is invalid and the operator information will not be aged.
- For operators that are compiled online (operators are compiled directly by calling an acl API such as or ), the API loads a single-operator model based on input parameters. In this case, the aging configuration is valid.
If an operator is compiled and executed by using and , promptly execute the operator in case the operator information is aged. If that happens, you will need to recompile the operator. You are advised to use instead, which completes operator compilation and execution as one action.
- An API maintains two mapping queues, one for static-shape and the other for dynamic-shape operators. However, their maximum lengths are both determined by the max_opqueue_num parameter.
- The value of max_opqueue_num is the sum of the number of single-operator models with statically loaded operators and the number of single-operator models with operators compiled online. Therefore, the value of max_opqueue_num must be greater than the number of single-operator models with statically loaded operators that are available in the current process; otherwise, the information about operators compiled online cannot be aged.
Example of Error Information Report Mode Configuration
Value range of the err_msg_mode parameter: 0 is the default value, indicating that error information is obtained by thread. 1 indicates that error information is obtained by process.
The following is an example of the configuration file:
{
"err_msg_mode": "1"
}
Example of Default Device Configuration
The value of default_device is a device ID, which can be 0 or a decimal positive integer. You can call aclrtGetDeviceCount to obtain the number of available devices. The device ID range is [0, (Number of available devices – 1)].
The following is an example of the configuration file:
{
"defaultDevice":{
"default_device":"0"
}
}
Example of AI Core Stack Size Configuration
Use aicore_stack_size to set the stack size, in bytes. The value must meet the following requirements:
- The value of aicore_stack_size must be an integer multiple of 16 KB. Otherwise, the value will be rounded up to meet this requirement.
- The minimum value of aicore_stack_size is 32 KB. If the input value is smaller than that, the default value 32 KB will be used.
- The maximum value of aicore_stack_size for each product is as follows:
For
Atlas A3 training products /Atlas A3 inference products , the maximum value of aicore_stack_size is 192 KB.For
Atlas A2 training products /Atlas A2 inference products , the maximum value of aicore_stack_size is 192 KB.For
Atlas 200I/500 A2 inference products , the maximum value of aicore_stack_size is 7,680 KB.
The following is an example of the configuration file:
{
"StackSize":{
"aicore_stack_size":32768
}
}
Example of Configuring the Event Resource Scheduling Mode
Value range of event_mode: 0 is the default value, indicating the memory mode, where the number of event resources is limited by the memory. 1 indicates the hardware acceleration mode, where the number of event resources is limited by hardware specifications, but the performance is better.
The following is an example of the configuration file:
{
"acl_graph":{
"event_mode":"0"
}
}
See Also
For the API call example, see Initialization and Deinitialization.
More flexible APIs are provided for enabling dump or profiling. Unlike aclInit, these APIs can be called repeatedly in a process, allowing varied dump or profiling configurations with each call.
- To obtain dump data, see aclmdlInitDump, aclmdlSetDump, and aclmdlFinalizeDump. If dump data does not need to be written to a file, you can obtain the dump data by using a callback function. For details, see acldumpRegCallback.
- To obtain profiling data, see Profiling Data Collection.

