Performance Verification
After the tuning is complete, restore the code and refresh the compilation cache, that is, set ACL_OP_COMPILER_CACHE_MODE to force. The following is an example:
import torch
import torch_npu
option = {"ACL_OP_COMPILER_CACHE_MODE":"force"}
torch_npu.npu.set_option(option)
Before using the tuned custom repository, ensure that the binary mode is not enabled. The method is as follows:
torch_npu.npu.set_compile_mode(jit_compile=True)
Use the tuned custom repository to perform training again to check whether the performance is improved. For details about how to use the custom repository, see Usage of Tuned Custom Repositories.
Parent topic: Offline Tuning in PyTorch-based Training Scenarios