Function: init

Applicability

Product

Supported (√/x)

Atlas A3 training products / Atlas A3 inference products

Atlas A2 training products / Atlas A2 inference products

Atlas training products

Atlas inference products

Atlas 200I/500 A2 inference products

Function Usage

Performs initialization.

Prototype

  • C Prototype
    1
    aclError aclInit(const char *configPath)
    
  • Python Function
    1
    ret = acl.init(config_path)
    

Parameter Description

Parameter

Description

config_path

Path of the configuration file, including the file name.

The configuration file is in JSON format. A JSON file allows up to 10 levels of curly brackets and square brackets, respectively. To use the default configurations, directly call the acl.init interface without passing any parameter or set the configuration file to an empty JSON string (that is, only {} exists in the configuration file).

The configuration file is in JSON format. The following configurations are supported:

  • Dump configuration (if the operator input or output contains sensitive user information, there may be risks of information leak):
    • Model dump configuration (used to export the input and output data of operators at each layer in the model) and single-operator dump configuration (used to export the input and output data of an operator). The exported data is used to compare with that of a specified model or operator to locate accuracy issues. For details about the configuration example, description, and restrictions, see Examples of Model Dump Configuration and Single-Operator Dump Configuration. By default, this dump configuration is disabled.
    • Exception operator dump configuration (used to export the input and output data, workspace information, and Tiling information of the exception operator). The exported data is used to analyze AI Core errors. For details about the configuration example, see Example of Dump Configuration for Exception Operators. By default, this dump configuration is disabled.
    • Dump configuration of the overflow/underflow operator (used to export the input and output data of the overflow/underflow operator in the model). The exported data is used to analyze overflow/underflow causes and locate model accuracy issues. For details about the configuration example, description, and restrictions, see Example of Overflow/Underflow Operator Dump Configuration. By default, this dump configuration is disabled.
    • Configuration for operator dump watch mode (used to enable the observation mode for the output data of a specified operator). If you suspect that the memory is overwritten by other operators after locating the accuracy issues of some operators and excluding the calculation issues of the operators, you can enable the dump watch mode. For details about the configuration example and restrictions, see Dump Watch Configuration for Operators. The dump watch mode is disabled by default.
  • Profiling configuration. For the configuration example, description, and restrictions, see Profiling Instructions. By default, profiling configuration is disabled.
  • Operator cache aging configuration. To save memory and balance the calling performance, you can use the max_opqueue_num parameter to configure the maximum length of the "operator type and single-operator model" mapping queue. If the length of the mapping queue reaches the maximum, the least used mapping information and single-operator models in the cache are deleted before the most recent mapping information and the corresponding single-operator model are loaded. The default maximum length of the mapping queue is 20000. For details about the configuration example and restrictions, see Example of Operator Cache Aging Configuration.
  • Error information report mode configuration, which is used to control the acl.get_recent_err_msg API to obtain error information by process or thread level. By default, error information is obtained by thread level. For details about the example, see Example of Error Information Report Mode Configuration.
  • Default device configuration (used to configure the default compute device). For details about configuration example and description, see #EN-US_TOPIC_0000002534428027/section38127418371.

    If the device is specified through set_device, aclrtSetDevice has a high priority.

    If you want to explicitly create a context after enabling the default device configuration, you need to call set_device. Otherwise, service exceptions may occur.

  • AI Core stack size configuration, which is used to control the size of the stack space allocated to each AI Core during kernel execution in a process. The default value is 32 KB. For details about the configuration examples, see AI Core Stack Size Configuration Examples. When compiling AI Core operators, the AI Core stack size configured here is valid only when O0 is enabled.

    This configuration is only available for the following models:

    Atlas A3 training series products / Atlas A3 inference series products

    Atlas A2 training series products / Atlas A2 inference series products

    Atlas 200I/500 A2 inference product

  • Event resource scheduling mode configuration, which is used to control how event resources are scheduled when model running instances are built in capture mode. For details about configuration examples, see Example of Configuring the Event Resource Scheduling Mode.

    This configuration is only available for the following models:

    Atlas A3 training series products / Atlas A3 inference series products

    Atlas A2 training series products / Atlas A2 inference series products

NOTE:

Dump configuration and profiling configuration are mutually exclusive. The dump operation could affect system performance, resulting in inaccurate profile data collected by Profiling.

Return Value Description

Return Value

Description

ret

Integer error code: 0 on success; else, failure.

Restrictions

  • acl.init must be called before app development using pyacl APIs. Otherwise, an error may occur during the initialization of internal system resources, causing other service exceptions.

  • aclInit can be called multiple times in a process, but the aclFinalize or aclFinalizeReference API must be called for deinitialization.
    • The configuration must be consistent each time aclInit is called. Otherwise, only the configuration of the first call is valid. Calling the aclInit API again may cause errors.
    • To be compatible with earlier versions, if the aclInit API is called repeatedly, the error code ACL_ERROR_REPEAT_INITIALIZE will be returned. You can ignore this error and continue to process services.
    • The aclInit and aclFinalize APIs can be called repeatedly for initialization and deinitialization, respectively. Only sequential calls are available for the two APIs.
      aclInit --> service processing --> aclFinalize --> aclInit --> service processing --> aclFinalize

      After you call aclInit multiple times, you only need to call aclFinalize once to perform deinitialization. The aclInit reference count will be reset to 0.

    • If the aclInit and aclFinalizeReference APIs are called for initialization and deinitialization, respectively, the two APIs need to be called in pairs.

      aclFinalizeReference involves reference counting. Each time aclInit is called, the reference count increases by 1. Each time aclFinalizeReference is called, the reference count decreases by 1. Deinitialization is performed only when the reference count decreases to 0.

      Initialization and deinitialization can be performed repeatedly. Both sequential and concurrent calls are available for the two APIs.

      • Sequential API calls

      • Concurrent API calls

Example of Overflow/Underflow Operator Dump Configuration

If dump_debug is set to on, the overflow/underflow operator configuration is enabled. The following is an example of the configuration file:
{
    "dump":{
        "dump_path":"output",
        "dump_debug":"on"
    }
}
The details are as follows:
  • If dump_debug is not set or set to off, the overflow/underflow operator configuration is disabled.
  • If the overflow/underflow operator configuration is enabled, dump_path must be set to specify the path for storing exported dump files.

    After obtaining the exported data files, parse the files by referring to "Overflow/Underflow Operator Data Collection and Analysis" in Accuracy Debugging Tool Guide.

    dump_path can be either absolute or relative.
    • An absolute path starts with a slash (/), for example, /home.
    • A relative path starts with a directory name, for example, output.
  • This function cannot be enabled when model or single-operator dump configuration is enabled. Otherwise, an error is returned.
  • Only overflow/underflow data of AI Core operators can be collected.

Dump Watch Configuration for Operators

Set dump_scene to watcher to enable dump watch for operators. Below is an example of the content in the configuration file. The configuration effect is as follows: (1) After operators A and B are executed, the output of operators C and D is dumped; (2) After operators C and D are executed, the output of operators C and D is also dumped. The dump files of operators C and D in (1) will be compared with those in (2) to check whether operator A or B overwrites the output memory of operator C or D.

{
    "dump":{
        "dump_list":[
            {
                "layer":["A", "B"],
                "watcher_nodes":["C", "D"]
            }
        ],
        "dump_path":"/home/",
        "dump_mode":"output",
        "dump_scene":"watcher"
    }
}

The details are as follows:

  • If the operator dump watch mode is enabled, the overflow/underflow operator dump (by configuring the dump_debug parameter) or the single-operator model dump (by configuring the dump_op_switch parameter) cannot be enabled. Otherwise, an error will be reported. Dump watch cannot be applied in the single-operator API dump scenario.
  • In dump_list, the layer parameter is used to configure the names of the operators that may overwrite the memory of other operators, and the watcher_nodes parameter is used to configure the names of the operators with accuracy issues possibly due to output memory being overwritten by other operators.
    • If layer is unspecified, the output of the operators configured for watcher_nodes is dumped after all operators that support dump in the model are executed.
    • If an operator configured for layer and watcher_node is not in the static graph and static subgraph, the configuration does not take effect.
    • If an operator name configured for layer and watcher_node is duplicate, or an operator configured for layer is a collective communication operator (the operator type starts with Hcom, for example, HcomAllReduce), only the dump file of the operator configured for watcher_node is exported.
    • For a fusion operator, its name configured for watcher_node must be the name of the operator after fusion. If the name of an operator before fusion is configured, no dump file will be exported.
    • Currently, model_name cannot be configured in dump_list.
  • If the operator dump watch mode is enabled, dump_path, which is the path for storing the exported dump file, must be configured.

    The exported dump files cannot be viewed using a text tool. To view the content of a dump file, convert the dump file to a NumPy file and then view the NumPy file using Python. For details about the conversion procedure, see "Viewing Dump Files" in Accuracy Debugging Tool Guide.

    dump_path can be either absolute or relative.
    • An absolute path starts with a slash (/), for example, /home.
    • A relative path starts with a directory name, for example, output.
  • dump_mode is used to specify the data of the operators configured for watcher_nodes to be exported. Currently, only output can be configured.

Example of Default Device Configuration

The value of default_device is a device ID, which can be 0 or a decimal positive integer. You can call aclrtGetDeviceCount to obtain the number of available devices. The device ID range is [0, (Number of available devices – 1)].

The following is an example of the configuration file:

{
    "defaultDevice":{
        "default_device":"0"
    }
}

Example of AI Core Stack Size Configuration

Use aicore_stack_size to set the stack size, in bytes. The value must meet the following requirements:

  • The value of aicore_stack_size must be an integer multiple of 16 KB. Otherwise, the value will be rounded up to meet this requirement.
  • The minimum value of aicore_stack_size is 32 KB. If the input value is smaller than that, the default value 32 KB will be used.
  • The maximum value of aicore_stack_size for each product is as follows:

    For Atlas A3 training products / Atlas A3 inference products , the maximum value of aicore_stack_size is 192 KB.

    For Atlas A2 training products / Atlas A2 inference products , the maximum value of aicore_stack_size is 192 KB.

    For Atlas 200I/500 A2 inference products , the maximum value of aicore_stack_size is 7,680 KB.

The following is an example of the configuration file:

{
    "StackSize":{
        "aicore_stack_size":32768
    }
}

Example of Configuring the Event Resource Scheduling Mode

Value range of event_mode: 0 is the default value, indicating the memory mode, where the number of event resources is limited by the memory. 1 indicates the hardware acceleration mode, where the number of event resources is limited by hardware specifications, but the performance is better.

The following is an example of the configuration file:

{
    "acl_graph":{
        "event_mode":"0"
    }
}

Reference

For the API call example, see Initialization and Deinitialization.

More flexible APIs are provided for enabling dump or profiling. Unlike aclInit, these APIs can be called repeatedly in a process, allowing varied dump or profiling configurations with each call.