查看运行结果

  1. 登录存储服务器。

    以本地NFS,主机名以Ubuntu为例说明。

  2. 执行以下命令,进入任务运行yaml文件中的结果输出(output)目录。

    “/data/atlas_dls/output/”目录下的“logs”记录相关训练的FPS数值,本示例单机与分布式目录结构相同。

    root@ubuntu:/home# ll /data/atlas_dls/output/
    total 16896
    drwxr-x--- 2 HwHiAiUser HwHiAiUser    4096 Oct  7 16:06 ./
    drwxr-x--- 4 hwMindX    HwHiAiUser    4096 Oct  7 15:26 ../
    ...
    -rwxr-x--- 1 HwHiAiUser HwHiAiUser     682 Oct  7 16:06 logs

  3. 查看log内容。

    cat /data/atlas_dls/output/logs

    若回显中展示FPS数值,则表示训练成功。

    step:   100  epoch:  0.0  FPS:  496.4  loss: 6.605  total_loss: 7.922  lr:0.10000
    step:   200  epoch:  0.0  FPS: 1819.2  loss: 6.375  total_loss: 7.672  lr:0.10000
    step:   300  epoch:  0.1  FPS: 1898.2  loss: 6.277  total_loss: 7.551  lr:0.10000
    step:   400  epoch:  0.1  FPS: 2126.8  loss: 6.242  total_loss: 7.492  lr:0.10000
    step:   500  epoch:  0.1  FPS: 2357.4  loss: 6.090  total_loss: 7.320  lr:0.10000
    step:   600  epoch:  0.1  FPS: 2370.7  loss: 5.863  total_loss: 7.074  lr:0.10000
    step:   700  epoch:  0.1  FPS: 2368.6  loss: 5.902  total_loss: 7.094  lr:0.10000
    step:   800  epoch:  0.2  FPS: 2370.0  loss: 5.746  total_loss: 6.918  lr:0.10000
    step:   900  epoch:  0.2  FPS: 2371.0  loss: 5.605  total_loss: 6.758  lr:0.10000
    step:  1000  epoch:  0.2  FPS: 2365.9  loss: 5.750  total_loss: 6.887  lr:0.10000

  4. 进入模型输出目录,查看生成的模型文件。

    ls -l /data/atlas_dls/code/ResNet50_for_TensorFlow_1.7_code/scripts/model_dir

    drwxr-xr-x 2 root root      4096 Jan 15 17:58 ./
    drwxrwxrwx 5 root root      4096 Jan 15 18:03 ../
    -rw-r--r-- 1 root root        81 Jan 15 17:58 checkpoint
    -rw-r--r-- 1 root root  18649801 Jan 15 18:02 events.out.tfevents.1642240674.mindx-dls-test-default-test-0
    -rw-r--r-- 1 root root   8475459 Jan 15 17:58 graph.pbtxt
    -rw-r--r-- 1 root root 204685136 Jan 15 17:58 model.ckpt-0.data-00000-of-00001
    -rw-r--r-- 1 root root     16262 Jan 15 17:58 model.ckpt-0.index
    -rw-r--r-- 1 root root   4977000 Jan 15 17:58 model.ckpt-0.meta