网络执行过程中出现内存分配失败的问题,例如:
[ERROR] DRV(41852,python3.7):2021-11-11-11:19:04.222.597 [ascend][curpid: 41852, 50983][drv][devmm][devmm_ioctl_advise 190]<errno:12, 6> Ioctl device error! ptr=0x108800000000, count=15069901824, advise=0x8c, device=7. [ERROR] DRV(41852,python3.7):2021-11-11-11:19:04.817.448 [ascend][curpid: 41852, 50983][drv][devmm][devmm_ioctl_alloc_dev 247]<errno:12, 6> advise mem error! ret=6 [ERROR] DRV(41852,python3.7):2021-11-11-11:19:04.817.465 [ascend][curpid: 41852, 50983][drv][devmm][devmm_virt_heap_alloc_chunk_device 461]<errno:12, 6> devmm_ioctl_alloc error. ptr=0x108800000000. [ERROR] DRV(41852,python3.7):2021-11-11-11:19:04.817.475 [ascend][curpid: 41852, 50983][drv][devmm][devmm_virt_set_alloced_mem_struct 101]<errno:12, 6> alloc ptr err, ptr=0x1. [ERROR] DRV(41852,python3.7):2021-11-11-11:19:04.908.424 [ascend][curpid: 41852, 50983][drv][devmm][devmm_alloc_from_base_heap 137]<errno:12, 6> alloc phy mem from base heap err=0x1, va:0x108800000000, size:15069901824,15069901824. [ERROR] DRV(41852,python3.7):2021-11-11-11:19:04.908.437 [ascend][curpid: 41852, 50983][drv][devmm][devmm_virt_free_check_and_get_pg 365]<errno:12, 6> va(0x108800000000) is not alloced, pg is already in buddy,pfn(544),order(4),flags(1) [ERROR] DRV(41852,python3.7):2021-11-11-11:19:04.908.446 [ascend][curpid: 41852, 50983][drv][devmm][devmm_virt_heap_free 625]<errno:12, 6> addr not alloced, addr=0x108800000000,start=0x100000000000,end=0x17ffffffffff [ERROR] DRV(41852,python3.7):2021-11-11-11:19:04.908.459 [ascend][curpid: 41852, 50983][drv][devmm][devmm_alloc_managed 126]<errno:12, 6> heap_alloc_managed out of memory, pp=0x1, bytesize=15069901824. [ERROR] RUNTIME(41852,python3.7):2021-11-11-11:19:04.908.481 [npu_driver.cc:691]50983 DevMemAllocHugePageManaged:[LOAD][LOAD][driver interface] halMemAlloc failed: device_id=7, size=15069901824, type=2, env_type=3, drvRetCode=6! [ERROR] DRV(41852,python3.7):2021-11-11-11:19:07.106.816 [ascend][curpid: 41852, 50983][drv][devmm][devmm_ioctl_advise 190]<errno:12, 6> Ioctl device error! ptr=0x108800000000, count=15069901824, advise=0x88, device=7.
网络运行默认使用内存动态分配方式,每个图动态申请各自的内存。同时,系统默认将图及变量内存进行隔离,图内存默认为26G,变量内存默认为5G,总和不超过31G。
当网络模型层数过大时,网络中所有图的内存之和很容易超过26G,可能会出现内存分配失败的问题。
此时建议用户使用内存静态分配方式,多个图共享一块内存,最大图内存不超过26G就能保证正常运行。
在内存静态分配方式下,如果仍然存在变量内存超限问题,则可以适当增加变量内存大小,减少网络内存大小,但总和不超过31G。
通过配置环境变量设置为静态内存分配方式:
export GE_USE_STATIC_MEMORY=1
在内存静态分配方式下,如果仍然存在变量内存超限问题,则可以通过配置graph_memory_max_size和variable_memory_max_size的大小,来调整内存限制,前提是权重和Feature map总内存不超过31G。
from npu_bridge.npu_init import * config = tf.ConfigProto() custom_op = config.graph_options.rewrite_options.custom_optimizers.add() custom_op.name = "NpuOptimizer" custom_op.parameter_map["graph_memory_max_size"].s = tf.compat.as_bytes(str(16 * 1024 * 1024 * 1024)) custom_op.parameter_map["variable_memory_max_size"].s = tf.compat.as_bytes(str(15 * 1024 * 1024 * 1024)) config.graph_options.rewrite_options.remapping = RewriterConfig.OFF config.graph_options.rewrite_options.memory_optimization = RewriterConfig.OFF with tf.Session(config=config) as sess: sess.run(...)
Estimator训练脚本:
npu_config=NPURunConfig( graph_memory_max_size=str(26*1024 * 1024 * 1024), variable_memory_max_size=str(5*1024 * 1024 * 1024) )