iCache Data Verification Fault
Symptom
The slog log of device (report/*/slog/dev-os-id/[run|debug]/device-os/device-os_*.log) contains the keyword [fault_manager] event_id: [0x80C98000].
2024-04-22-09-06-17/hisi_logs/device-2/20240422090623-533810000/log/ts.log:5177:[ERROR] TSCH(-1,null):2024-04-20-17:02:52.772.875 35906 (dieid:0,cpuid:0) aicore.c:767 stars_print_error_pc_icache_and_hbm_info: stat f or dump pc start, aiv_id=47, icache_miss_num=8161, hbm_miss_num=0, compare_num=32, compare_fail_num=
[ERROR] TSCH(-1,null):2024-09-04-00:12:52.986.322 438 (dieid:0,cpuid:0) aicore_icache_plat.c:848 check_error_pc_icache_and_hbm_info: stat for dump pc start, aic_id=1, icache_miss_num=8176, hbm_miss_num=0, compare_num=17, compare_fail_num=0
Fault root causes
Locate the error keyword and check the value of compare_fail_num. If the value is not 0, the iCache memory bit error occurs.
Solution
Search for Health Management Fault Definition of the corresponding version. The iCache memory bit error is described as follows (some key fields are listed).
Event ID |
0x80C98000 |
|---|---|
Fault Name |
The AI Core instruction data fails to be verified. |
Fault Description/Possible Cause |
The iCache data is inconsistent with the GM data. The possible causes are as follows:
|
Impact |
The current AI task fails. If the AI Core is not restored, subsequent AI tasks also fail. |
Automatic Fault Resolution Mode |
|
System Handling Suggestion |
|