Starting the Resilience-Controller
Procedure
- Log in to the Kubernetes master node as the root user and run the following command to check whether the Resilience-Controller image and version are correct:
docker images | grep resilience-controller
Example:root@ubuntu:# docker images | grep resilience-controller resilience-controller v3.0.0 c532e9d0889c About an hour ago 142MB
- If yes, go to 2.
- If no, create an image and distribute it. For details, see Creating an Image.
- Copy the YAML file in the directory where the Resilience-Controller software package is decompressed (for example, /home/ascend-resilience-controller) to any directory on the Kubernetes master node (for example, /home/ascend-resilience-controller). If the Resilience-Controller software package is decompressed on the Kubernetes master node, you do not need to copy the YAML file.
cd /home/ascend-resilience-controller scp root@{IP_address_of_the_node_where_the_software_package_is_decompressed}:/home/ascend-resilience-controller/resilience-controller-*.yaml ./ - Skip this step if you do not need to modify the component startup parameters. Otherwise, modify the startup parameters of Resilience-Controller in the YAML file based on your requirements. For details about the startup parameters, see Table 1. You can run the ./resilience-controller -h command to view the parameter description.
- Start the Resilience-Controller.
- Run the following command if the KubeConfig certificate is imported.
kubectl apply -f resilience-controller-without-token-{version}.yaml - Run the following command if the KubeConfig certificate is not imported:
kubectl apply -f resilience-controller-*.yaml
The following is an example:
root@ubuntu:/home/ascend-resilience-controller# kubectl apply -f resilience-controller-without-token-v3.0.0.yaml deployment.apps/resilience-controller created root@ubuntu:/home/ascend-resilience-controller# kubectl get pod -n mindx-dl NAME READY STATUS RESTARTS AGE ... resilience-controller-7667495b6b-hwmjw 1/1 Running 0 11s ...
- Run the following command if the KubeConfig certificate is imported.
Parameters
Parameter |
Type |
Default Value |
Description |
|---|---|---|---|
-version |
bool |
false |
Resilience-Controller binary version number. |
-logLevel |
int |
0 |
Log level.
|
-maxAge |
int |
7 |
Log backup time limit. The value ranges from 7 to 700, in days. |
-logFile |
string |
/var/log/mindx-dl/resilience-controller/run.log |
Log file. NOTE:
If the size of a log file exceeds 20 MB, automatic dump is triggered. The maximum size of a log file cannot be changed. |
-maxBackups |
int |
30 |
Maximum number of dumped log files that can be retained. The value range is (0, 30]. |
-h |
None |
N/A |
Help information. |
Parent topic: Common Operations