Data Preprocessing

Objective

The data preprocessing part of your code is the same as that in the benchmark model script.

Principle

The data preprocessing part of your code may have an automatically-set resource-based variable, which will lead to different dataset shuffle orders. Check the API calls related to data preprocessing in the code to minimize the difference. The following gives a typical example:

When shuffling dataset, the buffer size is set on top of automatic host memory size query. If the NPU host memory size is greatly different from that of the benchmark model host, dataset shuffle orders will also be different significantly, resulting in unsatisfactory model accuracy.

Procedure

Check that the number of files read via the file read API is the same as that of the benchmark.
Check that the method in which the source data is converted into the input samples is unchanged.
Check that the samples are padded in the same way as the samples input into the benchmark model.
Check that the samples are shuffled in the same way (such as with the same number of samples in the batch to shuffle, or the parallelism during dataset shuffling) as those of the benchmark.

Parent topic: Ported Script Check