Failure to Restart a User Process After Unexpected Exit

Symptom

A user process cannot be restarted after unexpected exit. The log message similar to the following is displayed.

AscendCL log message: aclrtProcessReport failed

aclrtProcessReport failed, ret = 107012
aclrtProcessReport failed, ret = 107012

Runtime log message: halResourceIdAlloc xxx failed

[ERROR] DRV(2086,rtstest_host):2021-06-09-02:14:46.034.368 [ascend][curpid: 2086, 2086][drv][tsdrv][halResourceIdAlloc 477]id is exhausted, type(0 stream), range[0, 1024), dev_id(0), tsid(0).
[ERROR] RUNTIME(2086,rtstest_host):2021-06-09-02:14:46.034.380 [npu_driver.cc:285]2086 StreamIdAlloc:[driver interface] halResourceIdAlloc streamid failed: device_id=0, tsId=0, drvRetCode=48!
[ERROR] RUNTIME(2086,rtstest_host):2021-06-09-02:14:46.034.401 [stream.cc:448]2086 Setup:Failed to alloc stream id, retCode=0x702001a.
[ERROR] RUNTIME(2086,rtstest_host):2021-06-09-02:14:46.034.416 [context.cc:1251]2086 StreamCreate:Setup stream failed, retCode=0x702001a.
[ERROR] RUNTIME(2086,rtstest_host):2021-06-09-02:14:46.034.440 [logger.cc:211]2086 StreamCreate:Create stream failed, priority=7 ,flags=0.
[ERROR] RUNTIME(2086,rtstest_host):2021-06-09-02:14:46.034.458 [api_c.cc:461]2086 rtStreamCreateWithFlags:ErrCode=207008, desc=[driver error:no stream resource], InnerCode=0x702001a
[ERROR] RUNTIME(2086,rtstest_host):2021-06-09-02:14:46.034.469 [error_message_manage.cc:26]2086 ReportFuncErrorReason:rtStreamCreateWithFlags execute failed, reason=[driver error:no stream resource]

Possible Cause

According to the log, the allocation of resources such as public task IDs, stream IDs, and event IDs fails. The possible causes are as follows:

  • Resources are used up by other processes.
  • Resources are not destroyed when the previous process exits.

Solution

To rectify the fault, perform the following steps:

  • Wait for one minute and restart the process to ensure that the resources of the previous process are destroyed.
  • Stop other processes or restart the process after other processes are complete.
  • If the resource allocation failure persists, check whether the number of available resources exceeds the upper limit. If no, restart the environment to forcibly destroy resources and restore the environment.