Inference Configuration Items

An inference configuration file supports the configuration of only one inference service project. If multiple inference service projects are configured, the last one is used.

Table 1 Inference configuration service items

Inference Service Configuration Item

Description

Data Type

Mandatory or Not

Modifiable or Not

inferType

Inference type. The value can be streams or models, which indicates the pipeline inference service and model inference service, respectively.

String

Yes

Yes

name

Stream name or model name. The stream name is the name of the inference stream specified in the pipeline file.

Only characters from {0-9,a-z,A-Z,+,-,_} are supported. The name character string is used to form the URI of the RESTful API for the inference service. The inference service has restrictions on the URI length, so you need to set a proper length for the name character string. For details, see RESTful APIs.

If this parameter is set to the stream name, ensure that the name of this parameter in the file is the same as that in the pipeline file. Otherwise, even if the service is successfully started, the corresponding inference flow cannot be found during request processing.

String

Yes

Yes

path

Path of the pipeline file or OM model file. It can be a relative path or an absolute path. The relative path points to the path created by StreamServer. Note that the inference service process must have the permission to access the configuration path.

String

Yes

Yes

deviceId

ID of the device running the inference service. You need to confirm the hardware resources in advance by running the npu-smi info command in the environment where the Ascend device is installed.

The value ranges from 0 to 1024 and cannot exceed the ID range configured in the current environment. If the current configuration file is of the stream type, this configuration item does not take effect. The actual device ID is the one in the specified pipeline in the configuration file.

int

Yes

Yes

timeoutMs

Inference timeout interval, in milliseconds. The default value is 3000 ms. The value ranges from 1 to 100000 ms.

Positive integer

No

Yes

inputs

Input tensor

Tensor

Yes

Yes

outputs

Output tensor

Tensor

Yes

Yes

dynamicBatching

Whether single-model inference supports dynamic batching configuration

dynamicBatching

No

Yes

Table 2 Tensor configuration items

Tensor Configuration Item

Description

Data Type

Mandatory or Not

Modifiable or Not

name

Tensor name. Only values in {0-9,a-z,A-Z,+,-,_} can be used. The length ranges from 1 to 100.

String

Yes

Yes

id

Tensor ID, starting from 0. For stream inference, this parameter corresponds to the input/output plugin IDs (appsrcX/appsinkX) of the pipeline. The value range is [0, 10000].

int

Yes

Yes

dataType

Tensor data type. Set this parameter to a data type defined in Table 3.

String

Yes

Yes

format

Tensor data format. Set this parameter to a data format defined in Table 4.

String

Yes

Yes

shape

Tensor shape (dimension). The dimension range of the tensor shape is (0, 10000], and the product range of all dimensions is (0, max_content_length), where max_content_length is the maximum length of the request body defined in streamserver.conf.

Integer array

Yes

Yes

data

Base64-encoded data string to be inferred. (This parameter needs to be set only for inference requests and does not need to be set in the configuration file.)

String

No

Yes

Table 3 Tensor data types

Tensor Data Type

Description

FLOAT32

32-bit floating-point

FLOAT16

16-bit floating-point

INT8

8-bit signed integer

INT32

32-bit signed integer

UINT8

8-bit unsigned integer

UINT16

16-bit unsigned integer

UINT32

32-bit unsigned integer

INT64

64-bit signed integer

UINT64

64-bit unsigned integer

DOUBLE64

64-bit double-precision floating-point

BOOL

Boolean

STRING

String

BINARY

Binary

Table 4 Tensor data formats

Tensor Data Format

Description

FORMAT_NONE

No format

FORMAT_NHWC

NHWC

FORMAT_NCWH

NCWH

Table 5 Dynamic batch configuration items

Dynamic Batch Configuration Item

Data Type

Mandatory or Not

Modifiable or Not

Description

preferredBatchSize

Integer array

Yes

Yes

Batch size supported by the OM model

waitingTime

int

No

Yes

Maximum waiting time for forming a group of batches in the multi-batch model scenario, in ms. The value ranges from 1 to 50000, and defaults to 5000. If the waiting time exceeds the maximum value, the system stops waiting, and the inference is performed automatically.

dynamicStrategy

String

No

Yes

Policy used to select a proper batch size during dynamic batch inference. The default value is Nearest.

  • Nearest: Use the batch size that is closest to the absolute value of the difference between the number of cached images. If the absolute values are the same, use the larger one.
  • Upper: Use the minimum batch size that is greater than or equal to the number of cached images.
  • Lower: Use the maximum batch size that is less than or equal to the number of cached images.

singleBatchInfer

int

No

Yes

Single-batch inference switch, Boolean type.

  • 0 (default): Perform single-batch or multi-batch inference based on the first dimension of the model.
  • 1: Perform only single-batch inference regardless of whether the first dimension of the model is 1.

Note: The values of the waitingTime, dynamicStrategy, and singleBatchInfer fields are the same as those of the mxpi_tensorinfer plugin.