Result Viewing of Rescheduling Upon Inference Card Faults

If a fault occurs during the running of an inference job, Volcano schedules the job to another NPU.

Procedure

Run the following command to check the job running status:
```
kubectl get pod --all-namespaces
```
If the job name changes from resnetinfer1-2-scpr5 to resnetinfer1-2-xsdsf, as shown in the following command output, the rescheduling is successful. The job name is generated based on a random character string. Use the actual job name.
```
NAMESPACE        NAME                                       READY   STATUS    RESTARTS   AGE
...
default      resnetinfer1-2-xsdsf                    1/1    Running   0       10s
...
```

Run the following command to view job logs :

kubectl logs -f resnetinfer1-2-xsdsf

Command output:

[2025-02-24 19:13:09,331] [2269] [281472887965984] [llm] [INFO] [logging.py-331] : Answer[0]:  Deep learning is a subset of machine learning that uses neural networks with multiple layers to model complex relationships between
[2025-02-24 19:13:09,331] [2269] [281472887965984] [llm] [INFO] [logging.py-331] : Generate[0] token num: (0, 20)

Parent topic: Use on the CLI (Volcano)