Precautions
Warning
The training component provides only model training capabilities. If the dataset is confidential, data security needs to be considered for the whole solution.
Running as a Non-root User
- Running programs as the root user poses security risks and is uncontrollable. Execute scripts as a non-root user.
- For common users of the sudo group, do not use the sudo + command mode to execute the scripts.
- You need to ensure that the user has the read permission on the dependent processor library, ACLlib, and specified input dataset.
- You need to manually set the maximum number of files that can be opened at the same time to 65535. The command is as follows:
ulimit -n 65535
Setting umask
You are advised to set umask to 027 or higher to improve file permissions.
For example, to set umask to 027, perform the following operations:
- Log in to the server as the root user and edit the /etc/profile file.
vim /etc/profile
- Add umask 027 to the end of the /etc/profile file, save the file, and exit.
- Run the following command to make the configuration take effect:
source /etc/profile
Program Entry
For details about the execution script of each function entry, see their corresponding sections. Other scripts are not function execution entries.
Training Dataset Requirements
The training dataset needs to be split into two parts for training and test. Therefore, there must be more than 10 images in the training dataset.
Hyperparameter Setting Suggestions
- Learning rate of a deep learning model
Generally, it is recommended that the value be within the range (0,1).
- Model input width and height
Generally, the value is greater than 0 and is a multiple of 2. The requirements vary depending on models.
- Thresholds of confidence, NMS, and IoU
Generally, it is recommended that the value be within the range [0,1].
- epoch_size (number of training epochs)
Generally, it is recommended that the value of epoch_size be within the range [1,500].
Setting batch_size and Model Size
- The ratio of the training data to the test data is 4:1. Ensure that one fifth of the total training data is greater than the value of batch_size.
- The memory space of the NPU and CPU is limited. Therefore, the values of batch_size and the input model width and height cannot be too large. If [MallocDynamicMem] Out of memory!!! displays, decrease their values.
If Error: Tensor data_vsel_ub_fp32_appiles buffer size(327688) more than available buffer size(309768) displays, the operator UB exceeds the maximum. In this case, reduce the input model width and height.