Installing the MindIO TFT SDK on Compute Nodes

In the Python environment used by a foundation model training framework, install the MindIO TFT SDK to accelerate recovery of training job faults.

Procedure

  1. Log in to the installation node as the {MindIO-install-user} user.

    The password set by the installation user must meet the password complexity requirements. For details, see Password Complexity Requirements. The password validity period is 90 days. You can change the validity period in the /etc/login.defs file or run the chage command to set the validity period. For details, see Setting the Validity Period of a User Account.

  2. Upload the memory cache system software package to a path on the device on which the installation user has the read and write permissions.
    • Use the actual memory cache system software package.
    • If the Python environment is a shared directory, upload the installation package to any compute node. Otherwise, upload the installation package to all compute nodes.
  3. Go to the software package upload path and decompress the memory cache system software package.
    unzip Ascend-mindxdl-mindio_{version}_linux-{arch}.zip
    Table 1 Extracted files

    File

    Description

    mindio_acp-{mindio_acp_version}-py3-none-linux_{arch}.whl

    MindIO ACP installation package

    mindio_ttp-{mindio_ttp_version}-py3-none-linux_{arch}.whl

    MindIO TFT installation package

  4. Go to the upload path and install the MindIO TFT SDK.
    mindio_ttp-{mindio_ttp_version}-py3-none-linux_{arch}.whl is used as an example.
    pip3 install mindio_ttp-{mindio_ttp_version}-py3-none-linux_{arch}.whl --force-reinstall --no-index
    • If the following information is displayed when the MindIO TFT SDK is installed for the first time, the installation is successful.
      1
      2
      3
      Processing ./mindio_ttp-{mindio_ttp_version}-py3-none-linux_{arch}.whl
      Installing collected packages: mindio_ttp
      Successfully installed mindio_ttp-{mindio_ttp_version}
      
    • If the following information is displayed when the MindIO TFT SDK is not installed for the first time, the installation is successful.
      1
      2
      3
      4
      5
      6
      7
      Processing ./mindio_ttp-{mindio_ttp_version}-py3-none-linux_{arch}.whl
      Installing collected packages: mindio_ttp
        Atempting uninstall: mindio-ttp
          Found existing installation: mindio_ttp {mindio_ttp_version}
          Uninstalling mindio_ttp-{mindio_ttp_version}:
            Successfully uninstalled mindio_ttp-{mindio_ttp_version}
      Successfully installed mindio_ttp-{mindio_ttp_version}
      
  5. Change the permission on executable files and code scripts in the software installation directory to 550 to prevent unauthorized tampering.
    chmod -R 550 {MindIO TFT SDK installation directory}