使用Ascend PyTorch Profiler接口采集PyTorch性能数据过程中,打印“Incorrect schedule”提示信息,如下图所示:
profiler.py: Incorrect schedule: Stop profiler while current state is WARMUP which will result in enpty parsed data.
profiler.py: Incorrect schedule: Stop profiler while current state is RECORD which may result in incomplete parsed data.
profiler.py: Stop profiler while current state is RECORD_AND_SAVE, perhaps the scheduling sycle has not yet completed.
设置的schedule参数不合理,导致Profiler尚未完成设置的schedule周期就提前退出。如下面的案例所示:
实际模型训练step为1,但是设置schedule中skip_first=1, active=2,此时Profiler在刚好处于RECORD状态(准备好采集),但是训练进程已经退出,所以导致性能数据缺失或者为空的情况
设置schedule参数repeat为默认值0,可能导致采集的最后一个step数据不完整,日志会提示profiler.py: Stop profiler while current state is RECORD_AND_SAVE, perhaps the scheduling sycle has not yet completed.和profiler.py: Incorrect schedule: Stop profiler while current state is RECORD which may result in incomplete parsed data.。此时不建议最后一个step数据作为性能数据分析参考。
检查设置的schedule是否正确(建议通过此公式判断:step总数 >= skip_first+(wait+warmup+active)*repeat),确保Profiler完成schedule后还有足够的step用于性能数据采集。