Configuration Before MindX DL Component Installation
Before installing MindX DL components in batches, perform the following operations to complete necessary configuration:
Pre-deployment Check
Before batch deployment, ensure that the domain name server (DNS) has been configured for the OS. You can select a server for check.
You can run the cat /etc/resolv.conf command to check the DNS configuration. If the DNS configuration (such as nameserver information) is displayed, the DNS has been configured. If no information is displayed, the DNS has not been configured. In this case, perform the following steps to configure it:
- Query the name of the network port configured with a service IP address.
ip a
- Configure the DNS for the corresponding network port.
nmcli connection modify enp125s0f0 +ipv4.dns 10.10.10.254 nmcli conn up enp125s0f0
- 10.10.10.254 is an example of the DNS address.
- enp125s0f0 is an example of the network port name.
- The use of the nmcli command to configure DNS is only an example. You need to follow the DNS configuration scheme provided by the specific OS image provider to complete the configuration.
- Check whether the DNS is successfully configured.
cat /etc/resolv.conf
If the following information is displayed, the DNS is successfully configured:# Generated by NetworkManager nameserver 10.10.10.254
inventory_file Configuration
- Log in to the server where ascend-deployer is located.
- Configure the IP addresses and usernames of target devices on the server where ascend-deployer is located.
Go to the ascend-deployer/ascend_deployer directory, open the inventory_file file to add the configuration, and run the :wq command to save the file and exit. Related formats are listed below. (Set the host of the master node to the default control node of Kubernetes.)
- Before cluster training, configure the hccn_tool network for the training nodes. Refer to 1 to modify only the HCCN variable configuration area.
- Master and worker variable configuration areas
Table 1 Host group configuration description Host Groups to Be Configured
Mandatory/Optional
master
Mandatory. The number of master nodes must be an odd number.
worker
Mandatory
[master] xx.xx.xx.xx ansible_ssh_user="root" set_hostname="master-1" k8s_api_server_ip=xx.xx.xx.xx [worker] xx.xx.xx.xx ansible_ssh_user="root" set_hostname="worker-1"
Table 2 Parameters for configuring host group variables Field
Mandatory/Optional
Description
IP
Mandatory
Server IP address.
ansible_ssh_user
Mandatory
Account for logging in to a remote server using SSH. The account must be root.
ansible_ssh_pass
Optional
Password for logging in to a remote server using SSH. If the SSH key authentication mode is configured and the root user can log in to the server, you do not need to set this parameter.
ansible_ssh_port
Optional
Port for SSH connection. If the default port 22 is used, you do not need to set this parameter. If a non-default port is used, set this parameter as required.
ansible_become_password
Optional
This password must be the same as that entered during SSH login of the account. You do not need to set this parameter when the root user is used.
set_hostname
Optional. This parameter is mandatory when there are multiple master or worker nodes. It is optional when there is only one node.
Name of each node in a Kubernetes cluster. You are advised to use the master-1 or worker-1 format to fill in the names in sequence. If a Kubernetes cluster already exists, the names must be those of the nodes in the existing Kubernetes cluster. The names must be configured in lowercase, and cannot be filled in arbitrarily.
k8s_api_server_ip
Optional. This parameter is mandatory for master nodes and cannot be configured for worker nodes.
Entry for Kubernetes to provide services for external systems. Set this parameter to the IP address of the master node. In the single-master or multi-master node scenario, the k8s_api_server_ip parameter must be set to an existing IP address on the local host.
- Global variable configuration area
[all:vars] POD_NETWORK_CIDR="xx.xx.xx.xx/xx" KUBE_SERVICE_CIDR="xx.xx.xx.xx/xx" KUBE_VIP="" HARBOR_SERVER="" HARBOR_ADMIN_USER="" HARBOR_ADMIN_PASSWORD="" HARBOR_PUBLIC_PROJECT="false" HARBOR_CA_FILE=""
Table 3 Parameters for configuring global variables Field
Mandatory/Optional
Description
POD_NETWORK_CIDR
Mandatory
Subnet IP network segment used by a Kubernetes cluster. If this segment overlaps with the server IP network segment, change it to another private network segment. The default value is 192.168.0.0/16.
Configure IPv6 addresses based on the network plan. You are advised to set the address segment to FEC0:2::/64.
NOTE:Ensure that the node IP addresses do not conflict with the default network segment (192.168.0.0/16) of the Kubernetes cluster. If they conflict, change the value of POD_NETWORK_CIDR to another private network segment, for example, 10.0.0.0/16.
KUBE_SERVICE_CIDR
Mandatory
Service address segment (10.96.0.0/12 by default) used by a Kubernetes cluster. Service is a concept in Kubernetes and this parameter corresponds to the address used by a Service whose type is ClusterlP. Each Service has its own address.
When configuring this network segment, pay attention to the following:
- The Service address can be used only within the Kubernetes cluster and cannot be used outside the cluster.
- The Service address segment cannot overlap with the virtual switch address segment.
- The Service address segment cannot overlap with the pod's virtual switch address segment.
Configure IPv6 addresses based on the network plan. You are advised to set the address segment to FEC0:1::/108.
KUBE_VIP
Optional
In the multi-master node scenario, a virtual IP address needs to be configured. KUBE_VIP must be in the same subnet as the IP addresses of nodes in a Kubernetes cluster, and must be an idle IP address that is not used by others.
HARBOR_SERVER
Optional. This parameter is mandatory only when the Harbor service is used.
Harbor service address configured when the Harbor image repository is used. The format is <ip>:<port>, excluding a protocol.
HARBOR_ADMIN_USER
HARBOR_ADMIN_PASSWORD
Optional. This parameter is mandatory only when the Harbor service is used.
Harbor administrator account configuration, which is used to create a project in Harbor to push and pull Kubernetes and MindX DL images.
HARBOR_PUBLIC_PROJECT
Optional. This parameter is mandatory only when the Harbor service is used.
Project public state of MindX DL images in Harbor. The value can be false or true.
HARBOR_CA_FILE
Optional
If HTTPS is used, you can use this parameter to configure the root CA file path of the Harbor image repository. If this parameter is not set, the default value no is used.
- On Atlas A2 training products, IP, k8s_api_server_ip, POD_NETWORK_CIDR, and KUBE_SERVICE_CIDR can be set to IPv4 or IPv6 addresses. The type of the IP address used by an SSH client such as PuTTY to connect to the execution device must be the same as that configured in the inventory_file, which should be either IPv4 or IPv6. On other devices, only IPv4 addresses are supported.
- The usernames of the remote devices are configured in the inventory_file. Both the root user and non-root users can be used for software installation. For details about software that can be installed by non-root users, see Table 1. If you want to install software listed in Table 1 as a non-root user, set ansible_ssh_user in the file to the root user and use the root user to install sys_pkg (system component) and npu (driver and firmware, installed for Ascend devices). Then, set ansible_ssh_user to the non-root user and install the software listed in Table 1.
- In the inventory_file, you can configure the passwords of other target devices for SSH password authentication by specifying the ansible_ssh_pass field. If the SSH key authentication mode is used, this configuration is not required. If the OS of a target device is openEuler_20.03LTS, openEuler_22.03LTS, Kylin V10 SP2, or CentOS 7.6, the ansible_ssh_pass field cannot be used for this configuration.
- (Optional) During batch deployment, the default number of concurrences is 50, and the maximum number of concurrences is 255. If the number of environments to be deployed is greater than 50, go to the ascend-deployer/ascend_deployer directory and change the value of forks in the ansible.cfg file to the total number of nodes to be deployed to speed up the deployment.
[defaults] forks=50