Inference Service

You are advised to install and run the inference service as a common user. You are not advised to run the inference service in sudo + command mode as a common user in the sudo group.

Deployment Procedure

Obtain the required software packages.

The software is classified into the commercial edition and community edition. The functions of the two editions are the same except the download permission and use of purpose.

The community edition can be downloaded directly without applying for related permissions, but it cannot be used for commercial purposes. To download the commercial edition, you need to apply for related permissions.

Table 1 lists the content, names, and supported scenarios of inference service software packages. The methods of using the inference service software packages are the same. Replace the RUN package in the following steps as required.

To avoid using software packages that have been tampered with during transmission or storage, download their digital signature files for integrity check while downloading the software packages.

After downloading the software package, verify the PGP digital signature of the software package based on the OpenPGP Signature Verification Guide. If the software package fails the verification, do not use the software package, and contact Huawei technical support engineers.

Before using or upgrading a software package, verify its digital signature to ensure that the software package is not tampered with.

For enterprise customers, visit https://support.huawei.com/enterprise/en/tool/software-digital-signature-openpgp-validation-tool-TL1000000054.

**Table 1** Inference service software packages and supported scenarios
Package Name	Package Content	Supported Scenario	How to Obtain
Ascend-mindxsdk-mxserving_{version}_{arch}.run	Basic service framework	Custom scenarios	Commercial edition: Link. (Select MindX 3.0.0, download the corresponding software package in the table, and verify the digital signature.) {version} indicates the version number and {arch} indicates the OS architecture.
Ascend-mindxsdk-mxserving-3c_{version}_{arch}.run	Basic service framework + CCC quality inspection	Custom scenarios and CCC quality inspection algorithms
Ascend-mindxsdk-mxserving-sem_{version}_{arch}.run	Basic service framework + semiconductor	Custom scenarios and semiconductor application quality inspection algorithms

**Table 2** MindX SDK
Package Name	Package Content	Supported Scenario	How to Obtain
Ascend-mindxsdk-mxmanufacture_{version}_linux-{arch}.run	MindX SDK	-	Community edition: Link. (Click the mxManufacture tab, select MindX 3.0.0, and download the corresponding software package in the table.) Commercial edition: Link. (Select MindX 3.0.0, download the corresponding software package in the table, and verify the digital signature.) {version} indicates the version number and {arch} indicates the OS architecture.

Check the integrity of the software packages.
Verify the integrity of the downloaded software packages based on the digital signature verification tool and digital signatures of the software packages provided on the software package download page. For details, see the Signature Verification Guide.
Install the mxManufacture development kit.
For details, see Using CLI for Development > Environment Preparation > Installing MindX SDK in the MindX SDK mxManufacture User Guide.
Verify the consistency and integrity of the development kit.
```
bash Ascend-mindxsdk-mxserving_{version}_linux-{arch}.run --check
```
If the following information is displayed, the kit meets the consistency and integrity requirements:
```
Verifying archive integrity...  100%   SHA256 checksums are OK. All good.
```
Install the inference service (mxAOIService).
Grant the execute permission on the kit.
```
chmod +x Ascend-mindxsdk_mxserving_{version}-linux-{arch}.run
```
Go to the directory where the package is stored and run the following command:
```
bash Ascend-mindxsdk-mxserving_{version}-linux-{arch}.run --install
```
After the installation is complete, go to the folder where the package is stored.
```
cd mxAOIService
```

Prepare the model file and training configuration file.

After model training, save the OM model and the training parameter configuration file train_params.config to the scripts/om_cfg directory.

The om_cfg directory needs to be manually created.
The model file must be an OM model file with the file extension .om, the file name must contain the Ascend AI Processor model, and the name is unique. A naming example:
- ssd_mobilenetv1_fpn_best 310.om: OM model for Ascend 310 AI Processor inference.
- ssd_mobilenetv1_fpn_best 310P3.om: OM model for Ascend 310P AI Processor inference.

When the scenario is det_cls, check whether the train_params.config of the classification model contains the dataset_meta_info field, which is used to transfer the information of the classification model to which the detection result is delivered. If the field does not exist, manually add it. The following is an example:

"dataset_meta_info": {"source_dataset": {"labels": ["label_1","label_2"]}}

Compile the model_service_param.json file and save it to the scripts directory. The following is an example:

{
	"task_name": "assm1-2",
	"project_list": [
		{
			"scene": "det", // Task type: foreign object detection
			"project_name": "project_ssd", // Task name
			"model_list": [
				{
					"application": "det", // Application scenario
					"model_name": "SSD", // Model name
					"result_path": "/opt/det2" // Path of the om and train_params.config files generated during training
				}
			]
		},
		{
			"scene": "seg",// Image segmentation
			"project_name": "project_unet",
			"model_list": [
				{
					"application": "seg",
					"model_name": "UNET",
					"result_path": "./scripts/om_cfg/seg"
				}
			]
		},
		{
		"scene": "det_cls", // Component error/missing/reverse detection
			"project_name": "project_ssd_resnet",
			"model_list": [
				{
					"application": "det",
					"model_name": "SSD",
					"result_path": "/opt/det2"
				},
				{
					"application": "cls",
					"model_name": "ResNet50",
					"result_path": "/opt/cls1"
				}
			]
		},
		{
			"scene": "det_det", // Detection + detection
			"project_name": "project_yolov4_yolov4",
			"model_list": [
				{
					"application": "det1",
					"model_name": "YOLOV4",
					"result_path": "./scripts/om_cfg/yolov4_1"
				},
				{
					"application": "det2",
					"model_name": "YOLOV4",
					"result_path": "./scripts/om_cfg/yolov4_2"
				}
			]
		},
		{
		"scene": "det_ocr", // Detection + industrial OCR
			"project_name": "project_ssd_crnn",
			"model_list": [
				{
					"application": "det",
					"model_name": "SSD",
					"result_path": "./scripts/om_cfg/det2"
				},
				{
					"application": "crnn",
					"model_name": "CRNN",
					"result_path": "./scripts/om_cfg/ocr"
				}
			]
		},
		{
		"scene": "det_seg", // Detection + image segmentation
			"project_name": "project_ssd_unet",
			"model_list": [
				{
					"application": "det",
					"model_name": "SSD",
					"result_path": "./scripts/om_cfg/det2"
				},
				{
					"application": "seg",
					"model_name": "UNET",
					"result_path": "./scripts/om_cfg/seg"
				}
			]
		},
		{
			"scene": "tag_paste", // Tag defect detection
			"project_name": "tag_paste",
			"custom_params": {
				"FirstDetectionFilter": "{'Type': 'Area', 'TopN': 0,'BottomN': 0,'MinArea': 0,'MaxArea': 0,'ConfThresh': 0.0}",
				"tag_1": "big",
				"tag_1_params": "{'edge_defect_thres_all': 20, 'edge_defect_thres_qiqiao': 40,  'shape': (1, 420, 840, 3),'physical_size': [100, 50]}"
			},
			"model_list": [
				{
					"application": "det",
					"model_name": "yolov4",
					"result_path": "/home/mxAOIService_CI_test_config/om_cfg_det+/yolov4"
				},
				{
					"application": "tag_seg_1",
					"model_name": "unet++",
					"result_path": "/home/mxAOIService_CI_test_config/om_cfg_det+/unet++_big"
				},
				{
					"application": "tag_cls_1",
					"model_name": "resnet",
					"result_path": "/home/mxAOIService_CI_test_config/om_cfg_det+/resnet50_big"
				}
			]
		},
		{
	        "scene": "wafer", // Wafer detection
			"project_name": "project_wafer",
			"model_list": [
				{
					"application": "wafer",
					"model_name": "00201Layer3",
					"result_path": "/home/mxAOIService/om_cfg/wafer",
					"ref_path": "/home/mxAOIService/om_cfg/wafer/ref"
				}
			]
		},
		{
               "scene": "det_htp", // High-performance detection
			"project_name": "project_yolov4",
			"model_list": [
				{
					"application": "det",
					"model_name": "yolov4",
					"result_path": "./scripts/om_cfg/yolov4"
				}
			]
		}
	]
}

Convert the model and parameters into the format that can be used by the inference service.

python3 scripts/infer_service_generation.py --model_params=./scripts/model_service_param.json

Information similar to the following is displayed.

Creating directory [./config/models/project_ssd/assm1-2/1]...
Creating directory [./config/models/project_unet/assm1-2/1]...
Creating directory [./config/models/project_ssd_resnet/assm1-2/1]...
Creating directory [./config/models/project_yolov4_yolov4/assm1-2/1]...
Creating directory [./config/models/project_ssd_crnn/assm1-2/1]...
Creating directory [./config/models/project_ssd_unet/assm1-2/1]...

The models folder is automatically generated by infer_service_generation.py in the config/models directory. The following is an example of the models directory structure:

.
├models
├── model_configs.json
├── project_ssd
│   └── assm1-2
│       └── 1
│           ├── label_0.names
│           ├── mindx_sdk.pipeline
│           ├── post_process_0.cfg
│           └── ssd_mobilenetv1_fpn_best3100.om
├── project_ssd_crnn
│   └── assm1-2
│       └── 1
│           ├── crnn3100.om
│           ├── label_0.names
│           ├── label_1.names
│           ├── mindx_sdk.pipeline
│           ├── post_process_0.cfg
│           ├── post_process_1.cfg
│           └── ssd_mobilenetv1_fpn_best3100.om
├── project_ssd_resnet
│   └── assm1-2
│       └── 1
│           ├── label_0.names
│           ├── label_cls_0.names
│           ├── mindx_sdk.pipeline
│           ├── post_process_0.cfg
│           ├── post_process_cls_0.cfg
│           ├── resnet3100.om
│           └── ssd_mobilenetv1_fpn_best3100.om
├── project_ssd_unet
│   └── assm1-2
│       └── 1
│           ├── label_0.names
│           ├── label_1.names
│           ├── mindx_sdk.pipeline
│           ├── post_process_0.cfg
│           ├── post_process_1.cfg
│           ├── ssd_mobilenetv1_fpn_best3100.om
│           └── unet3100.om
├── project_unet
│   └── assm1-2
│       └── 1
│           ├── label_0.names
│           ├── mindx_sdk.pipeline
│           ├── post_process_0.cfg
│           └── unet_wzf3100.om
├── project_wafer
│   └── assm1-2
│       └── 1
│           ├── 00201Layer3_3100.om
│           ├── 00201Layer3_3101.om
│           ├── 00201Layer3_3102.om
│           ├── 00201Layer3_3103.om
│           ├── mindx_sdk.pipeline
│           └── aoi_ai_config.json
└── project_yolov4_yolov4
    └── assm1-2
        └── 1
            ├── label_0.names
            ├── label_1.names
            ├── mindx_sdk.pipeline
            ├── post_process_0.cfg
            ├── post_process_1.cfg
            ├── yolov4_best3100.om
            └── yolov4_best3101.om

Start the service. For details about the configurable parameters and their descriptions, see Table 3.

./start.sh

You are advised to install and run the inference service as a common user. You are not advised to run the inference service in sudo + command mode as a common user in the sudo group.
As a component, the inference service needs to be integrated into the user's system.
The user needs to control the startup, stop, and restart of the inference service.
The inference service does not have a restart mechanism, so its restart function is controlled by the user's system.

**Table 3** Command parameter description
Parameter	Description
-d / --device_id	Deploys the inference service for the processor with a specified ID.
-i / --host	Configures the listening IP address of the inference service. The default value is 127.0.0.1. Configure the IP address based on the actual network deployment. For example, if the IP address is set to 0.0.0.0, network-wide listening is performed, which poses security risks. If the management plane, control plane, and user plane of the server running the inference service are divided, the original isolation principle of the system will be damaged. It is prohibited to set the IP address to 0.0.0.0 when deploying the inference service.
-p / --port	Specifies the port number. The default value is 8888. The value range is (1000, 65535].
-s / --https	Enables or disables HTTPS true (case insensitive): enables the HTTPS service and the server will verify the client (default). If the HTTPS service is used, you need to import the certificate. For details, see (Optional) Certificate Import. false (case insensitive): enables the HTTP service. When HTTP is used, data is transmitted in plaintext on the network, which may cause data leakage. Exercise caution when using it.
-u / --upload	Uploads inference images and results to a third-party platform. Results of component error/ missing/reverse detection and glue detection are uploaded only when TaskType is set to WithReg. This parameter is set to false by default. If it is set to true, the uploading is enabled. The http or https upload channel is synchronized with the inference service. The value is case-insensitive.
-c / --check	Enables or disables the processor check. This function is used in third-party container integration scenarios. When a processor is faulty and the container exits because the service process is stopped in the container, the integration party can restart the container after the container exits, allocate the processor, and restore the service. The integrated inference component cannot start the service. Do not enable this function. ai_server.py will be used in the container. If you need to create a process in the container, do not use this name as the process name. true: enables the processor check. deadly_error_watch.sh is invoked for a scheduled task to check whether the processor is faulty. If the processor is faulty, the inference service is stopped. false: disables the processor check.
-m / --monitor	Whether to upload the inference latency to the pushgateway port of Prometheus for monitoring the inference latency performance. The value is set to false by default, and the uploading is disabled. If it is set to true, the uploading is enabled. The http or https upload channel is synchronized with the inference service. The value is case-insensitive.
-h / --help	Displays help information.

Stop the service.
You can use either of the following methods to stop the inference process:
- Press Ctrl+C in the CLI.
- Run the following command:
```
./stop.sh
```
The stop.sh script runs the kill -SIGINT ${PID} command to stop the ai_server.py process in the current system. Do not use this process name to name other processes. Processes with the same name may be killed by mistake.

Supplementary Notes

Description of other optional parameters of the inference service software package

You can run the following command to view the optional parameters of the software package:

bash Ascend-mindxsdk-mxserving_{version}-linux-{arch}.run --help

**Table 4** Other optional parameters
Parameter	Description
--help \| -h	Displays help information.
--info	Prints the embedded information.
--list	Outputs the file list of the software package.
--check	Checks the software package integrity.
--install	Installs the software package.
--install-path=<path>	Specifies the installation path.
--version	Checks the version.
--upgrade	Upgrades the software package.

Software package installation
To uninstall the software package, execute the mxAOIService/bin/uninstall.sh file.
```
bash uninstall.sh
```

Precautions

The mxManufacture packages of different versions cannot be used together in the same environment.
By default, the HTTPS service is enabled for the inference service, and the server verifies the client.
The certificate required by the inference service is provided by the integrated third party. The inference service does not manage the certificate. Instead, the integrated third party provides certificate management services, such as certificate generation, import, and update, certificate integrity maintaining, expiration alarm, and certificate revocation list (CRL) management. You can use a self-signed certificate to perform self-verification in the development environment. For details, see Self-signed Certificate Creation Methods.
When the function of uploading images and inference results is enabled, add APULIS_ENDPOINT to the ~/.bashrc file, and configure the upload address.
```
export APULIS_ENDPOINT="http://127.0.0.1:8889/"
```
You can view the upload logs in logs/ai_server.log.

If the message upload queue full is displayed in the inference service log, the number of images in the upload queue has reached the maximum. The system will stop uploading images until there is room in the upload queue. If the upload queue is too long, more memory is occupied. The default queue length is 100. You can set the length in the config.yaml file as required. The following is an example.
```
# Set upload queue length, only use in upload mode.
upload_queue_length: 100
```
The way to use http or https in the channel for uploading images and inference latency to a third-party platform needs to be synchronized with the inference service.
- If the started inference service is in https mode, the encrypted channel for uploading images and inference latency to the third-party server must also be in https mode. In this case, the client performs unidirectional authentication on the third-party server.
- If the started inference service is in http mode, the URL received by the third-party server must be in the http format.
If https is used, you must import the required certificates to the config.yaml file by referring to (Optional) Certificate Import.
```
ai_platform_http_client: 
  ca: /path/to/ca.crt
  crl: /path/to/ai_platform_http_client.crl
http_pushgateway_client:
  ca: /path/to/pushgateway/ca.crt
  crl: /path/to/pushgateway/http_pushgateway_client.crl
```
- When a third-party platform integrates the inference component, information may be disclosed if http is used for reporting the inference results.
- If https is used but the CA certificate is not used to authenticate the server, the function of verifying the server is disabled when the inference results are reported. As a result, the server may be forged.

Parent topic: Deployment and Running