MemoryConfig Constructor

Applicability

Product	Supported
Atlas A3 training products/Atlas A3 inference products	√
Atlas A2 training products/Atlas A2 inference products	√
Atlas 200I/500 A2 inference products	☓
Atlas inference products	☓
Atlas training products	√

Function Description

Constructs an object of class MemoryConfig for configuring system memory handling.

Function Prototype

class MemoryConfig():
    def __init__(self,
                 atomic_clean_policy=0,
                 static_memory_policy=0,
                 memory_optimization_policy=None,
                 variable_use_1g_huge_page=0

Parameters

Option	Input/Output	Description
atomic_clean_policy	Input	Whether to clean up the memory occupied by all operators with the memset attribute (memset operators) on the network. The options are as follows: 0 (default): Enables collective cleanup. 1: Disables collective cleanup. Memory used by each memset operator is cleaned up separately. When the memset operators on the network occupy too much memory, you can try this method. However, this method may cause performance loss.
static_memory_policy	Input	Memory allocation mode used during network running. 0 (default): dynamic memory allocation. Memory is dynamically allocated based on the actual size. 2: dynamic memory expansion supported by only static shape. During network running, this option can be used to implement memory reuse between multiple graphs in a session. That is, the memory required by the maximum graph is allocated. For example, if the memory required by the current graph exceeds the memory of the previous graph, the memory of the previous graph is directly released. The memory is reallocated based on the memory required by the current graph. 3: dynamic memory expansion supported by only dynamic shape, which solves the fragment problem during dynamic memory allocation and reduces the memory usage of the dynamic-shape network. 4: dynamic memory expansion supported by both static and dynamic shapes. The default value is 0. NOTE: This option cannot be set to 2 or 4 when multiple graphs are executed concurrently. To be compatible with earlier versions, the system adopts the method of mode 2 even if this option is set to 1. If this option is set to 3 or 4, memory gains are generated, but performance may deteriorate.
variable_use_1g_huge_page	Input	In recommendation models, the embedding layer in TensorFlow uses variables. When embedding layers serve as input or output addresses for index-based operators (such as Gather and ScatterNd), large memory footprints may lead to extensive scattered access, potentially causing performance degradation. In such cases, you can try configuring this parameter to allocate memory for variables and constants using 1 GB huge pages, thereby improving memory access performance. The options are as follows: 0 (default): Uses the system default page size (4 KB or 2 MB) for memory allocation. 1: Allocates memory using 1 GB huge pages. If the allocation fails, an error log is printed and the service terminates. 2: Allocates memory using 1 GB huge pages. If the allocation fails, an error log is printed, but the service does not terminate; instead, it falls back to 2 MB pages. If the fallback allocation succeeds, the service continues; if it also fails, the service terminates. Using 1 GB huge pages can effectively reduce the number of page table entries and expand the address range covered by the translation lookaside buffer (TLB) cache, thereby improving performance for scattered access patterns. The TLB is a hardware module on the Ascend AI processor that caches recently used virtual-to-physical address mappings. NOTE: This parameter can be used only by the following products: Atlas A3 training products/Atlas A3 inference products Atlas A2 training products/Atlas A2 inference products

Option

Input/Output

Description

atomic_clean_policy

Input

Whether to clean up the memory occupied by all operators with the memset attribute (memset operators) on the network. The options are as follows:

0 (default): Enables collective cleanup.
1: Disables collective cleanup. Memory used by each memset operator is cleaned up separately. When the memset operators on the network occupy too much memory, you can try this method. However, this method may cause performance loss.

static_memory_policy

Input

Memory allocation mode used during network running.

0 (default): dynamic memory allocation. Memory is dynamically allocated based on the actual size.
2: dynamic memory expansion supported by only static shape. During network running, this option can be used to implement memory reuse between multiple graphs in a session. That is, the memory required by the maximum graph is allocated. For example, if the memory required by the current graph exceeds the memory of the previous graph, the memory of the previous graph is directly released. The memory is reallocated based on the memory required by the current graph.
3: dynamic memory expansion supported by only dynamic shape, which solves the fragment problem during dynamic memory allocation and reduces the memory usage of the dynamic-shape network.
4: dynamic memory expansion supported by both static and dynamic shapes.

The default value is 0.

NOTE:

This option cannot be set to 2 or 4 when multiple graphs are executed concurrently.
To be compatible with earlier versions, the system adopts the method of mode 2 even if this option is set to 1.
If this option is set to 3 or 4, memory gains are generated, but performance may deteriorate.

variable_use_1g_huge_page

Input

In recommendation models, the embedding layer in TensorFlow uses variables. When embedding layers serve as input or output addresses for index-based operators (such as Gather and ScatterNd), large memory footprints may lead to extensive scattered access, potentially causing performance degradation. In such cases, you can try configuring this parameter to allocate memory for variables and constants using 1 GB huge pages, thereby improving memory access performance.

The options are as follows:

0 (default): Uses the system default page size (4 KB or 2 MB) for memory allocation.
1: Allocates memory using 1 GB huge pages. If the allocation fails, an error log is printed and the service terminates.
2: Allocates memory using 1 GB huge pages. If the allocation fails, an error log is printed, but the service does not terminate; instead, it falls back to 2 MB pages. If the fallback allocation succeeds, the service continues; if it also fails, the service terminates.

Using 1 GB huge pages can effectively reduce the number of page table entries and expand the address range covered by the translation lookaside buffer (TLB) cache, thereby improving performance for scattered access patterns. The TLB is a hardware module on the Ascend AI processor that caches recently used virtual-to-physical address mappings.

NOTE:

This parameter can be used only by the following products:

Atlas A3 training products/Atlas A3 inference products
Atlas A2 training products/Atlas A2 inference products

Returns

An object of the MemoryConfig class, as an argument passed to the NPURunConfig call.

Constraints

None

Example

from npu_bridge.npu_init import *
...
mem_config = MemoryConfig(atomic_clean_policy=0, static_memory_policy=0)
session_config=tf.ConfigProto(allow_soft_placement=True)
config = NPURunConfig(memory_config=mem_config, session_config=session_config)

Parent topic: npu_bridge.estimator.npu.npu_config