host

Analysis Process

Open the timeline file in the Chrome browser by entering chrome://tracing/ in the address box. You can view the operator delivery information similar to the following.

As shown in the preceding figure, there are a large number of bubbles on the device, indicating that the card computing power is not fully used.

This problem usually occurs in dynamic graph scenarios or single-operator delivery scenarios. A single operator is delivered only after the host processing and computing are complete. In addition, the operator execution time is less than the host processing time.

If there is computing workload on the host, use the native Profiler tool of TensorFlow to export the data, and check the time consumption ratio of the host and device and the CPU usage. After hybrid computing is enabled, the system automatically executes the operators that cannot be executed on the device on the host. You can also specify some operators not to be offloaded to the device based on user configurations. As a result, the host computing workload increases. You can use the native Profiler tool of TensorFlow to capture the performance of the entire session to run (for details, see www.tensorflow.org) and analyze the time consumption ratio of the host and device.

Solution

Replace the single-operator delivery solution with the entire graph offloading solution.
Convert a dynamic graph to a static graph.
Increase the batch size, increase the operator execution time, and reduce the proportion of free time.
If the CPU usage is too high, optimize the CPU.
Deploy the cluster to ensure that the host resources are sufficient.
For single-machine deployment, you can consider hybrid deployment of non-host bound models and host bound models.

Parent topic: Different Bottlenecks