Hardening ClusterD Security
After ClusterD runs, the gRPC server is started to listen to messages from the gRPC client in the training container to complete resumable training. By default, ClusterD uses the insecure gRPC communication mode. You can use the TLS/SSL encryption mode to prevent attacks during communication.
The following uses the bidirectional authentication between ClusterD and NodeD as an example to describe how to harden ClusterD security. In this example, ClusterD functions as the server, and NodeD functions as the client.
Prerequisites
Before performing bidirectional authentication, you need to prepare the following files.
- rootCA.crt
- client.crt
- client.key
- server.crt
- server.key
Procedure
- Pull the Nginx image.
docker pull nginx
- Create the cert folder in path A and save the rootCA.crt, server.crt, and server.key files in Prerequisites to the cert folder.
- Create the conf folder in path A, create a file named nginx.conf in the folder, and write the following content to the file:
worker_processes 1; worker_cpu_affinity 0001; worker_rlimit_nofile 4096; events { worker_connections 4096; } http { port_in_redirect off; server_tokens off; autoindex off; access_log /var/log/nginx/access.log; error_log /var/log/nginx/error.log info; limit_req_zone global zone=req_zone:100m rate=20r/s; limit_conn_zone global zone=north_conn_zone:100m; server { listen < Pod IP address of ClusterD>:9500 ssl; # Pod IP address of ClusterD. The port number must be the same as that in the ClusterD configuration file. http2 on; proxy_ssl_session_reuse off; add_header Referrer-Policy "no-referrer"; add_header X-XSS-Protection "1; mode=block"; add_header X-Frame-Options DENY; add_header X-Content-Type-Options nosniff; add_header Strict-Transport-Security " max-age=31536000; includeSubDomains "; add_header Content-Security-Policy "default-src 'self'"; add_header Cache-control "no-cache, no-store, must-revalidate"; add_header Pragma no-cache; add_header Expires 0; ssl_session_tickets off; ssl_certificate /etc/nginx/conf.d/cert/server.crt; # Server certificate path (permission: 400) ssl_certificate_key /etc/nginx/conf.d/cert/server.key; # Private key path on the server. The private key cannot be configured in plaintext (permission: 400) ssl_client_certificate /etc/nginx/conf.d/cert/rootCA.crt; ssl_verify_client on; ssl_verify_depth 2; send_timeout 60; limit_req zone=req_zone burst=20 nodelay; limit_conn north_conn_zone 20; keepalive_timeout 60; proxy_read_timeout 900; proxy_connect_timeout 60; proxy_send_timeout 60; client_header_timeout 60; client_body_timeout 10; client_header_buffer_size 2k; large_client_header_buffers 4 8k; client_body_buffer_size 16K; client_max_body_size 20m; ssl_protocols TLSv1.2 TLSv1.3; ssl_ciphers "ECDHE-ECDSA-AES256-GCM-SHA384:ECDHE-ECDSA-AES128-GCM-SHA256:ECDHE-RSA-AES256-GCM-SHA384:ECDHE-RSA-AES128-GCM-SHA256 !aNULL !eNULL !LOW !3DES !MD5 !EXP !PSK !SRP !DSS !RC4"; ssl_session_timeout 10s; ssl_session_cache shared:SSL:10m; location / { grpc_pass grpc://<ClusterD pod IP address>:8899; # Pod IP address of ClusterD. If the ClusterD startup parameter useProxy is enabled, the IP address is 127.0.0.1. } } } - Add or modify the following fields in bold in the ClusterD startup YAML file.
# Set -useProxy=true in the ClusterD startup command to enable the local proxy. args: [ "/usr/local/bin/clusterd -logFile=/var/log/mindx-dl/clusterd/clusterd.log -logLevel=0 -useProxy=true" ] # Add the following information to containers in the deployment: - name: nginx image: nginx:latest imagePullPolicy: Never command: [ "/bin/bash", "-c", "--"] args: [ "sleep infinity" ] volumeMounts: - name: nginx-cert mountPath: /etc/nginx/conf.d/cert - name: nginx-conf mountPath: /etc/nginx/conf # Add the following information to volumes in the deployment: - name: nginx-cert hostPath: path: / {Path A}/cert # Path of the x509 certificate and private key directory. Replace Path A with the file path in Step 2. - name: nginx-conf hostPath: path: / {Path A}/config # Nginx startup configuration file. Replace path A with the file path in Step 2. # Change the values of ports in the service to the following: - protocol: TCP port: 8899 targetPort: 9500 - Start the ClusterD service.
kubectl apply -f clusterd-v{version}.yaml - Run the following command to check the IP address of the ClusterD pod and write the IP address to the nginx.conf file in Step 3:
kubectl get pod -A -o wide | grep clusterd
- Start the Nginx process.
## Access the Nginx container. kubectl exec -it -n mindx-dl clusterd-{xxx} -c nginx bash # Replace {xxx} with the pod ID randomly generated by Kubernetes after the ClusterD pod is started. ## Run the following command to start the Nginx process and enter the key password as prompted: nginx -c /etc/nginx/conf/nginx.conf - Start the NodeD service.
- Create the cert folder in path B and save the rootCA.crt, client.crt, and client.key files in Prerequisites to the cert folder.
- Create the conf folder in path B, create a file named nginx.conf in the folder, and write the following content to the file:
worker_processes 1; worker_cpu_affinity 0001; worker_rlimit_nofile 4096; events { worker_connections 4096; } http { port_in_redirect off; server_tokens off; autoindex off; access_log /var/log/nginx/access.log; error_log /var/log/nginx/error.log; grpc_buffer_size 16M; limit_req_zone global zone=req_zone:100m rate=20r/s; limit_conn_zone global zone=north_conn_zone:100m; server { listen 127.0.0.1:8899; http2 on; ssl_session_tickets off; limit_req zone=req_zone burst=20 nodelay; limit_conn north_conn_zone 20; keepalive_timeout 60; proxy_read_timeout 900; proxy_connect_timeout 60; proxy_send_timeout 60; client_header_timeout 60; client_body_timeout 10; client_header_buffer_size 200k; large_client_header_buffers 4 800k; client_body_buffer_size 160K; client_max_body_size 20m; location / { grpc_pass grpcs://<ClusterD service IP address>:9500; # Service IP address of ClusterD. You can run the kubectl get svc -A | grep clusterd command to query the service IP address. grpc_ssl_verify on; grpc_ssl_trusted_certificate /etc/nginx/conf.d/cert/rootCA.crt; grpc_ssl_verify_depth 2; grpc_ssl_certificate /etc/nginx/conf.d/cert/client.crt; grpc_ssl_certificate_key /etc/nginx/conf.d/cert/client.key; grpc_ssl_protocols TLSv1.2 TLSv1.3; grpc_ssl_ciphers "ECDHE-ECDSA-AES256-GCM-SHA384:ECDHE-ECDSA-AES128-GCM-SHA256:ECDHE-RSA-AES256-GCM-SHA384:ECDHE-RSA-AES128-GCM-SHA256 !aNULL !eNULL !LOW !3DES !MD5 !EXP !PSK !SRP !DSS !RC4"; grpc_ssl_name <SAN or CN in the service certificate>; # SAN or CN in the service certificate } } } - Add the following fields in bold to the YAML file for starting NodeD.
# Add the startup parameter sleep 150. args: [ "sleep 150; /usr/local/bin/noded -logFile=/var/log/mindx-dl/noded/noded.log -logLevel=0" ] # Add the following information to the containers item: - name: nginx image: nginx:latest imagePullPolicy: Never command: [ "/bin/bash", "-c", "--"] args: [ "sleep infinity" ] volumeMounts: - name: nginx-cert mountPath: /etc/nginx/conf.d/cert - name: nginx-conf mountPath: /etc/nginx/conf # Add the following information to the volumes item: - name: nginx-cert hostPath: path: /{Path B}/cert # Path of the x509 certificate and private key directory. - name: nginx-conf hostPath: path: /{Path B}/config # Nginx startup configuration file. - Start the NodeD.
kubectl apply -f noded-v{version}.yaml - Access the NodeD container and add a domain name resolution rule.
## Access the NodeD container. kubectl exec -it -n <noded pod ns> <noded pod name> bash ## Add a domain name mapping rule. echo 127.0.0.1 clusterd-grpc-svc.mindx-dl.svc.cluster.local >> /etc/hosts
- Start the Nginx process.
## Access the Nginx container. kubectl exec -it -n mindx-dl noded-{xxx} -c nginx bash # {xxx} indicates the pod ID randomly generated by Kubernetes after the NodeD pod is started. ## Start Nginx. nginx -c /etc/nginx/conf/nginx.conf
Parent topic: Security Hardening