Memory Allocator Configuration

As a memory allocator, jemalloc is designed to outperform traditional options like glibc. It specifically reduces memory fragmentation and boosts allocation efficiency during multi-threaded, high-concurrency tasks, which enables the system to fully realize the performance benefits of multi-core architectures and high-concurrency environments.

During memory allocation, locks cause thread waiting, which affects performance. The jemalloc function prevents threads from contending for locks by using thread-specific variables. Each thread maintains its own dedicated memory manager, and allocations occur locally within the thread itself. This eliminates the need for threads to contend for locks with one another.

Enabling or Disabling jemalloc

You can configure environment variables to enable or disable the jemalloc memory allocator.

Enable jemalloc:

1
export LD_PRELOAD=/usr/local/Ascend/cann/lib64/libjemalloc.so

Disable jemalloc:

unset LD_PRELOAD

Replace /usr/local/Ascend/cann in the example with the actual CANN software installation path. The default installation path of Toolkit is used as an example. For the root user, the path is /usr/local/Ascend/cann. For a non-root user, the path is ${HOME}/Ascend/cann.

Applicable Scenarios

According to developer testing, jemalloc can improve the inference performance to some extent in the inference scenario based on the MindIE framework. The following table shows the model performance test results based on the MindIE Benchmark tool. Because hardware and software configurations vary, the following test data is intended for reference only and should not be considered a performance standard.

Table 1 Experiment data

Model

Concurrency

Input Length

Experiment No.

jemalloc Disabled

(Tokens/s)

jemalloc Enabled

(Tokens/s)

Performance Gains (%)

Qwen2-7B

1

128

Experiment 1

151.123

155.3789

2.82

Experiment 2

151.2461

155.362

2.72

Experiment 3

151.5677

154.433

1.89

Average value

151.3122667

155.0579667

2.48