Second Compilation (Optimization Compilation)
# Convert the profiling data format. llvm-profdata merge /tmp/profile -o default.profdata # Configure the environment for compilation optimization. export CMAKE_C_FLAGS="-flto=thin -fuse-ld=lld -fprofile-use=/path/to/profile/default.profdata" export CMAKE_CXX_FLAGS="-flto=thin -fuse-ld=lld -fprofile-use=/path/to/profile/default.profdata" export CC=clang export CXX=clang++ export USE_XNNPACK=0 # Compile the optimized torch. cd pytorch-2.1.0 git clean -dfx python3 setup.py bdist_wheel # Compile the optimized torch_npu (copy default.profdata to the torch_npu directory). cd torch_npu git clean -dfx cp /path/to/profile/default.profdata . bash ci/build.sh --python=3.8 --enable_lto --enable_pgo=2
The torch and torch_npu files generated after the compilation are the high-performance optimization packages.
Parent topic: Compilation Optimization of torch and torch_npu