Analyzing AI Core Errors

Starting AI Core Error Analyzer

By default, the AI Core Error Analyzer extracts the flushing logs in the EP scenario for analysis. If you need to analyze logs in the RC scenario, change the target_id attribute in the .project file of the application project to RC before analysis, as shown in Figure 1. Ensure that the path of the extracted data for analysis is the log path in the RC scenario. For details about the log paths in EP and RC scenarios, see the Log Reference.
Figure 1 .project file
  1. Choose Ascend > AI Core Error Analyzer.
  2. Set the AI Core Error Analyzer parameters.
    Figure 2 Parameter configuration (Remote Run)
    Figure 3 Parameter configuration (Local Run)
    Table 1 Parameters

    Parameter

    Description

    Run Mode

    • Remote Run
    • Local Run

    In Windows OSs, only Remote Run is supported and this parameter is not displayed.

    SSH Connection

    Remote server address. Select the address of the remote server that runs the application from the drop-down list. If the IP address is not added, click to add it. For details, see Deployment.

    Compile Path

    In the inference scenario, set this parameter to the model conversion debug output path (see 3 in section "Preparing Data" for details). In the training scenario, set this parameter to the script execution path. It stores the outputs and .pbtxt files generated after operator build. Generally, it is the parent path of the kernel_meta file (for example, ~/model_convert).

    Output Path

    Local path for storing the analysis result. You can configure the path as required.

  3. Click Analyze. The analysis result is displayed in the Output area in the lower part of the MindStudio window.
    The comparison result is stored in the path specified by Output Path (listed in Table 1).

    If the error message "TOOLCHAIN_HOME is empty" is displayed, configure the TOOLCHAIN_HOME environment variable by following the configuration of the $HOME/ascend-toolkit/set_env.sh script during the Ascend-CANN-Toolkit installation in the CANN Software Installation Guide.

Viewing Analysis Result

The outputs of the AI Core Error Analyzer are generated to the IDE directory formatted as info_* (* is the timestamp, for example, info_20200903114406). The generated result file varies according to the actual situation. The following example is for reference only.

  • Inference example:
    • Ascend EP:
      ├── aicore_error  
      │   ├──aicerr_out
      │        ├──info_*  
      │             ├──aicerror_*  // AI Core Error Analyzer outputs   
      │                  ├──info.txt    // AI Core Error Analyzer report
      │                  ├──te_transdata_*.o  
      │                  ├──te_transdata_*.o.txt   // Decompilation file
      │             ├──collection    // Directory of error operator files
      │                  ├──compil
      │                       ├──kernel_meta
      │                            ├──CCE file
      │                            ├──JSON file
      │                            ├──loc.json file
      │                            ├──.o file                            
      │                       ├──ge_proto_xxxx_Build.txt
      │                  ├──dump    // Directory of dump files
      │                  ├──log    // Directory of host logs
      │                  ├──*   // Timestamp  
      │             ├──error.log    // ERROR-level log file
      │             ├──README.txt
      │   ├──npu_report    
      │        ├──*   // Timestamp
      │             ├──hisi_logs    // Black Box logs
      │             ├──message    // Device OS logs
      │             ├──slog  
      │             ├──stackcore                
    • Ascend RC:
      ├── aicore_error  
      │   ├──aicerr_out
      │        ├──info_*
      │             ├──aicerror_*  // AI Core Error Analyzer outputs   
      │                  ├──info.txt    // AI Core Error Analyzer report
      │                  ├──te_transdata_*.o  
      │                  ├──te_transdata_*.o.txt   // Decompilation file
      │             ├──collection    // Directory of error operator files
      │                  ├──compil
      │                       ├──kernel_meta
      │                            ├──CCE file
      │                            ├──JSON file
      │                            ├──loc.json file
      │                            ├──.o file                            
      │                       ├──ge_proto_xxxx_Build.txt
      │                  ├──dump    // Directory of dump files
      │                  ├──xxxxx       
      │             ├──error.log    // ERROR-level log file
      │             ├──README.txt           
  • Training example:
    ├── aicore_error  
    │   ├──aicerr_out
    │        ├──info_*  
    │             ├──collection    // Directory of error operator files
    │                  ├──log    // Directory of host logs
    │                       ├──*    //Process ID
    │                  ├──*     //Timestamp
    │                            ├──hisi_logs    //Black Box logs
    │                       ├──slog
    │   ├──npu_report    
    │        ├──*    // Timestamp
    │             ├──hisi_logs    // Black Box logs
    │             ├──message    // Device OS logs
    │             ├──slog  
    │             ├──stackcore