Inference Configuration Items

An inference configuration file supports the configuration of only one inference service project. If multiple inference service projects are configured, the last one is used.

**Table 1** Inference configuration service items
Inference Service Configuration Item	Description	Data Type	Mandatory or Not	Modifiable or Not
inferType	Inference type. The value can be streams or models, which indicates the pipeline inference service and model inference service, respectively.	String	Yes	Yes
name	Stream name or model name. The stream name is the name of the inference stream specified in the pipeline file. Only characters from {0-9,a-z,A-Z,+,-,_} are supported. The name character string is used to form the URI of the RESTful API for the inference service. The inference service has restrictions on the URI length, so you need to set a proper length for the name character string. For details, see RESTful APIs. If this parameter is set to the stream name, ensure that the name of this parameter in the file is the same as that in the pipeline file. Otherwise, even if the service is successfully started, the corresponding inference flow cannot be found during request processing.	String	Yes	Yes
path	Path of the pipeline file or OM model file. It can be a relative path or an absolute path. The relative path points to the path created by StreamServer. Note that the inference service process must have the permission to access the configuration path.	String	Yes	Yes
deviceId	ID of the device running the inference service. You need to confirm the hardware resources in advance by running the npu-smi info command in the environment where the Ascend device is installed. The value ranges from 0 to 1024 and cannot exceed the ID range configured in the current environment. If the current configuration file is of the stream type, this configuration item does not take effect. The actual device ID is the one in the specified pipeline in the configuration file.	int	Yes	Yes
timeoutMs	Inference timeout interval, in milliseconds. The default value is 3000 ms. The value ranges from 1 to 100000 ms.	Positive integer	No	Yes
inputs	Input tensor	Tensor	Yes	Yes
outputs	Output tensor	Tensor	Yes	Yes
dynamicBatching	Whether single-model inference supports dynamic batching configuration	dynamicBatching	No	Yes

**Table 2** Tensor configuration items
Tensor Configuration Item	Description	Data Type	Mandatory or Not	Modifiable or Not
name	Tensor name. Only values in {0-9,a-z,A-Z,+,-,_} can be used. The length ranges from 1 to 100.	String	Yes	Yes
id	Tensor ID, starting from 0. For stream inference, this parameter corresponds to the input/output plugin IDs (appsrcX/appsinkX) of the pipeline. The value range is [0, 10000].	int	Yes	Yes
dataType	Tensor data type. Set this parameter to a data type defined in Table 3.	String	Yes	Yes
format	Tensor data format. Set this parameter to a data format defined in Table 4.	String	Yes	Yes
shape	Tensor shape (dimension). The dimension range of the tensor shape is (0, 10000], and the product range of all dimensions is (0, max_content_length), where max_content_length is the maximum length of the request body defined in streamserver.conf.	Integer array	Yes	Yes
data	Base64-encoded data string to be inferred. (This parameter needs to be set only for inference requests and does not need to be set in the configuration file.)	String	No	Yes

**Table 3** Tensor data types
Tensor Data Type	Description
FLOAT32	32-bit floating-point
FLOAT16	16-bit floating-point
INT8	8-bit signed integer
INT32	32-bit signed integer
UINT8	8-bit unsigned integer
UINT16	16-bit unsigned integer
UINT32	32-bit unsigned integer
INT64	64-bit signed integer
UINT64	64-bit unsigned integer
DOUBLE64	64-bit double-precision floating-point
BOOL	Boolean
STRING	String
BINARY	Binary

**Table 4** Tensor data formats
Tensor Data Format	Description
FORMAT_NONE	No format
FORMAT_NHWC	NHWC
FORMAT_NCWH	NCWH

**Table 5** Dynamic batch configuration items
Dynamic Batch Configuration Item	Data Type	Mandatory or Not	Modifiable or Not	Description
preferredBatchSize	Integer array	Yes	Yes	Batch size supported by the OM model
waitingTime	int	No	Yes	Maximum waiting time for forming a group of batches in the multi-batch model scenario, in ms. The value ranges from 1 to 50000, and defaults to 5000. If the waiting time exceeds the maximum value, the system stops waiting, and the inference is performed automatically.
dynamicStrategy	String	No	Yes	Policy used to select a proper batch size during dynamic batch inference. The default value is Nearest. Nearest: Use the batch size that is closest to the absolute value of the difference between the number of cached images. If the absolute values are the same, use the larger one. Upper: Use the minimum batch size that is greater than or equal to the number of cached images. Lower: Use the maximum batch size that is less than or equal to the number of cached images.
singleBatchInfer	int	No	Yes	Single-batch inference switch, Boolean type. 0 (default): Perform single-batch or multi-batch inference based on the first dimension of the model. 1: Perform only single-batch inference regardless of whether the first dimension of the model is 1.
Note: The values of the waitingTime, dynamicStrategy, and singleBatchInfer fields are the same as those of the mxpi_tensorinfer plugin.

Parent topic: Inference Configuration File