Error Reported When Rec SDK TensorFlow Is Used for Model Training in the Arm Environment

Symptom

The error "ImportError: /usr/local/python3.7.5/lib/python3.7/site-packages/sklearn/__check_build/../../scikit_learn.libs/libgomp-d22c30c5.so.1.0.0: cannot allocate memory in static TLS block" is displayed when Rec SDK TensorFlow is used for model training and the scikit-learn library is imported in the Arm environment.

Possible Cause

OpenMP is used for Rec SDK TensorFlow compilation, which uses the dynamic Thread Local Storage (TLS) memory space. However, sklearn needs to use the static TLS space when performing parallel computing. On an AArch64 server, the dynamic TLS and static TLS use the same pre-allocation pool. When Rec SDK TensorFlow is imported first, too much memory space is pre-allocated. As a result, the libgomp.so space is insufficient when sklearn is imported.

Solution

In the main.py file of model code, place import sklearn before Rec SDK TensorFlow to ensure that libgomp.so has sufficient static TLS space.