Features

If basic tuning cannot implement satisfactory performance, you can use the Profiling tool to collect profile data during training and analyze it to accurately locate software and hardware performance bottlenecks, thereby improving the performance analysis efficiency. The tool provides an economical solution for improving service performance.

By default, profile data is not collected during training. If you need to collect and parse profile data, see the procedure in this section.

The following describes the process of collecting and analyzing TensorFlow network profile data.

  1. Collect profile data.
    The profile data of the TensorFlow network can be collected globally or locally.
    • Global collection: collect the profile data of all behaviors executed by graphs. The data size is large.
    • Local collection: collect the profile data of a specified subgraph or step.

    To collect data globally, you can either modify the training script and configure the enable_profiling parameter (see Methods for Modifying the Training Script), or set the environment variable PROFILING_MODE (see Using Environment Variables). enable_profiling is prior to PROFILING_MODE.

    To collect data locally, you can call the Profiler class in TF Adapter 1.x through the with statement and put the operations that require profile data collection into the Profiler class. For details, see Collecting Profile Data Locally.

  2. Parse and export profile data.

    Regardless of the collection mode, you can use the msprof command line to parse the profile data and export the parsing result to a specified directory. For details, see Parsing and Exporting Profile Data.

  3. Analyze the profile data.

    You can analyze the timeline and summary files obtained by parsing the profile data to identify performance bottlenecks. For typical analysis examples, see Analyzing Profile Data.