Inference Configuration Items
An inference configuration file supports the configuration of only one inference service project. If multiple inference service projects are configured, the last one is used.
Inference Service Configuration Item |
Description |
Data Type |
Mandatory or Not |
Modifiable or Not |
|---|---|---|---|---|
inferType |
Inference type. The value can be streams or models, which indicates the pipeline inference service and model inference service, respectively. |
String |
Yes |
Yes |
name |
Stream name or model name. The stream name is the name of the inference stream specified in the pipeline file. Only characters from {0-9,a-z,A-Z,+,-,_} are supported. The name character string is used to form the URI of the RESTful API for the inference service. The inference service has restrictions on the URI length, so you need to set a proper length for the name character string. For details, see RESTful APIs. If this parameter is set to the stream name, ensure that the name of this parameter in the file is the same as that in the pipeline file. Otherwise, even if the service is successfully started, the corresponding inference flow cannot be found during request processing. |
String |
Yes |
Yes |
path |
Path of the pipeline file or OM model file. It can be a relative path or an absolute path. The relative path points to the path created by StreamServer. Note that the inference service process must have the permission to access the configuration path. |
String |
Yes |
Yes |
deviceId |
ID of the device running the inference service. You need to confirm the hardware resources in advance by running the npu-smi info command in the environment where the Ascend device is installed. The value ranges from 0 to 1024 and cannot exceed the ID range configured in the current environment. If the current configuration file is of the stream type, this configuration item does not take effect. The actual device ID is the one in the specified pipeline in the configuration file. |
int |
Yes |
Yes |
timeoutMs |
Inference timeout interval, in milliseconds. The default value is 3000 ms. The value ranges from 1 to 100000 ms. |
Positive integer |
No |
Yes |
inputs |
Input tensor |
Tensor |
Yes |
Yes |
outputs |
Output tensor |
Tensor |
Yes |
Yes |
dynamicBatching |
Whether single-model inference supports dynamic batching configuration |
dynamicBatching |
No |
Yes |
Tensor Configuration Item |
Description |
Data Type |
Mandatory or Not |
Modifiable or Not |
|---|---|---|---|---|
name |
Tensor name. Only values in {0-9,a-z,A-Z,+,-,_} can be used. The length ranges from 1 to 100. |
String |
Yes |
Yes |
id |
Tensor ID, starting from 0. For stream inference, this parameter corresponds to the input/output plugin IDs (appsrcX/appsinkX) of the pipeline. The value range is [0, 10000]. |
int |
Yes |
Yes |
dataType |
Tensor data type. Set this parameter to a data type defined in Table 3. |
String |
Yes |
Yes |
format |
Tensor data format. Set this parameter to a data format defined in Table 4. |
String |
Yes |
Yes |
shape |
Tensor shape (dimension). The dimension range of the tensor shape is (0, 10000], and the product range of all dimensions is (0, max_content_length), where max_content_length is the maximum length of the request body defined in streamserver.conf. |
Integer array |
Yes |
Yes |
data |
Base64-encoded data string to be inferred. (This parameter needs to be set only for inference requests and does not need to be set in the configuration file.) |
String |
No |
Yes |
Tensor Data Type |
Description |
|---|---|
FLOAT32 |
32-bit floating-point |
FLOAT16 |
16-bit floating-point |
INT8 |
8-bit signed integer |
INT32 |
32-bit signed integer |
UINT8 |
8-bit unsigned integer |
UINT16 |
16-bit unsigned integer |
UINT32 |
32-bit unsigned integer |
INT64 |
64-bit signed integer |
UINT64 |
64-bit unsigned integer |
DOUBLE64 |
64-bit double-precision floating-point |
BOOL |
Boolean |
STRING |
String |
BINARY |
Binary |
Tensor Data Format |
Description |
|---|---|
FORMAT_NONE |
No format |
FORMAT_NHWC |
NHWC |
FORMAT_NCWH |
NCWH |
Dynamic Batch Configuration Item |
Data Type |
Mandatory or Not |
Modifiable or Not |
Description |
|---|---|---|---|---|
preferredBatchSize |
Integer array |
Yes |
Yes |
Batch size supported by the OM model |
waitingTime |
int |
No |
Yes |
Maximum waiting time for forming a group of batches in the multi-batch model scenario, in ms. The value ranges from 1 to 50000, and defaults to 5000. If the waiting time exceeds the maximum value, the system stops waiting, and the inference is performed automatically. |
dynamicStrategy |
String |
No |
Yes |
Policy used to select a proper batch size during dynamic batch inference. The default value is Nearest.
|
singleBatchInfer |
int |
No |
Yes |
Single-batch inference switch, Boolean type.
|
Note: The values of the waitingTime, dynamicStrategy, and singleBatchInfer fields are the same as those of the mxpi_tensorinfer plugin. |
||||