Second Compilation (Optimization Compilation)

# Convert the profiling data format.
llvm-profdata merge /tmp/profile -o default.profdata

# Configure the environment for compilation optimization.
export CMAKE_C_FLAGS="-flto=thin -fuse-ld=lld -fprofile-use=/path/to/profile/default.profdata"
export CMAKE_CXX_FLAGS="-flto=thin -fuse-ld=lld -fprofile-use=/path/to/profile/default.profdata"
export CC=clang
export CXX=clang++
export USE_XNNPACK=0

# Compile the optimized torch.
cd pytorch-2.1.0
git clean -dfx
python3 setup.py bdist_wheel

# Compile the optimized torch_npu (copy default.profdata to the torch_npu directory).
cd torch_npu
git clean -dfx
cp /path/to/profile/default.profdata .
bash ci/build.sh --python=3.8 --enable_lto --enable_pgo=2

The torch and torch_npu files generated after the compilation are the high-performance optimization packages.