Execution Timeout Error of the AI CPU Operator

Symptom

Any of the timeout errors is reported during operator execution.

  • Symptom 1
    1. The error code E39999 is reported during Runtime execution. The Runtime error messages "PrintAicpuErrorInfo" and "ErrCode=507018, desc=[aicpu exception]" are printed in the plog file on the host.
    2. In addition, the device log of the AI CPU contains the error message "HandleTaskTimeout".

    This symptom is the same as the error message in Possible Cause > Example 3 in Kernel Execution Error of the AI CPU Operator.

  • Symptom 2

    An error is reported during Runtime execution. The Runtime error messages "PrintAicpuErrorInfo" and "ErrCode=507017, desc=[aicpu timeout" are printed in the plog file.

    The plog file is stored in $HOME/ascend/log/[run|debug]/plog by default, in the format of plog-pid_yyymmddhhmmss.log.

    1
    [ERROR] RUNTIME(16243,msame):2022-09-22-11:27:01.794.510 [api_c.cc:661]16243 rtStreamSynchronize:[EXEC][DEFAULT]ErrCode=507017, desc=[aicpu timeout], InnerCode=0x715002a
    

Possible Cause

  • The operator input/output shape is too large, resulting in slow operator execution.
  • The hardware performance is poor and insufficient to support complex computation of a large number of operators.

Solution

  1. Call the aclrtSetOpExecuteTimeOut API to increase the operator execution timeout interval.

    The API prototype is defined as follows:

    1
    aclError aclrtSetOpExecuteTimeOut(uint32_t timeout)      // timeout, in seconds.
    
  2. If the error persists, contact technical support for troubleshooting. After obtaining the logs, click here to contact technical support.