MAX_RUNTIME_CORE_NUMBER
Description
In training and online inference scenarios, this environment variable can be used to enable multi-thread task scheduling of the graph executor (host) for the network in dynamic shape graph mode.
The value of this environment variable must be an integer. If the value is greater than or equal to 2, multiple threads are enabled for graph execution and scheduling tasks on the host. The number of threads is the same as the value of this environment variable.
If this environment variable is used, you are advised to set it to 3 to achieve excellent performance.
Example
export MAX_RUNTIME_CORE_NUMBER=3
Restrictions
- This environment variable is used only in graph mode.
- To use this environment variable, the number of CPU cores that can be executed on the host must be greater than or equal to 2.
- Before executing the first iteration during graph execution, you need to bind the host process to the specified CPU core to achieve better performance.
The following is an example of core binding in PyTorch graph mode. Assume that there are 192 CPU cores and eight processes on the host, and each process is bound to 24 CPU cores. The code snippet is as follows:
1 2 3 4 5 6 7 8 9 10 11 12
import psutil # Core binding cpu_ids_array = [range(i*24, (i+1)*24) for i in range(0,8)] rank_os_par_array = [f"kernel_bond_rank{i}" for i in range(0,8)] rank_no = int(torch.distributed.get_rank()) kernel_bond_os_par = int(os.getenv(rank_os_par_array[rank_no], "0")) if kernel_bond_os_par == 0: logging.info(f"rank{rank_no} is about to bound cpu kernels") pid = os.getpid() process = psutil.Process(pid) process.cpu_affinity(cpu_ids_array[rank_no]) os.environ[rank_os_par_array[rank_no]] = str(kernel_bond_os_par + 1)