Dimension Reduction Training Script

Environment Dependencies

  • Install Python 3.9. (Python 3.9 (recommended), 3.10, and 3.11 are supported.)
  • Install Faiss 1.10.0. You can run the pip install command to install it. The following is an example:
    1
    pip install faiss-cpu==1.10.0
    
  • Install torch_cpu and torch_npu. For details about the installation method, click this link. Select the required version based on the version mapping table.

Model Training

The script involved in this section is stored in tools/train/reduction by default.

  1. Train a model.
    python3 call_train.py --dataset_dir=Dataset_Dir --val_dataset_dir=./valid --generate_val=True --save_path=./modelsDr --dim=512 --npu=0 --ratio=4 --metric=L2 --mode=train --train_size=100000 --epochs=20 --train_batch_size=8192 --infer_batch_size=128 --learning_rate=0.0005 --log_stride=500 --construct_neighbors=100 --queries_validation=1000

    Parameter

    Description

    dataset_dir

    (Mandatory) Dataset path, of the string type. Currently, base.npy, query.npy, and gt.npy are read by default.

    For another dataset name, read the dataset by modifying the line where get_train_data is located.

    (Example) Original code:

    1
    2
    3
    4
     # load dataset demo before training, modify here if you want to load your own dataset
            #####################################################################
            learn, base = get_train_data(args.dataset_dir, args.train_size)
            #####################################################################
    

    Modified code:

    1
    2
    3
    4
    5
    6
     # load dataset demo before training, modify here if you want to load your own dataset
            #####################################################################
            # learn, base = get_train_data(args.dataset_dir, args.train_size)
            learn = np.fromfile(YOUR_LEARN_DATASET_DIR, dtype=np.float32).reshape((-1, YOUR_DATA_DIM))
            base = np.fromfile(YOUR_BASE_DATASET_DIR, dtype=np.float32).reshape((-1, YOUR_DATA_DIM))
            #####################################################################
    

    val_dataset_dir

    Path for storing the validation dataset, of the string type. This parameter is valid only when generate_val is set to True. The default value is ./validation/.

    generate_val

    Whether to generate the validation dataset, of the bool type. Set this parameter to True for the first training. The default value is False.

    save_path

    (Mandatory) Path for saving models, of the string type.

    dim

    (Optional) Dataset dimension, of the int type. The value range is [96, 128, 200, 256, 512, 2048]. The default value is 512.

    npu

    Device ID used for training, of the int type.

    Only single-device training is supported. By default, CPU training is used.

    ratio

    (Optional) Dimension reduction ratio, of the int type. The value range is [2, 4, 8, 16]. The default value is 8.

    metric

    Distance measurement criterion during model training. The value can be L2 or IP. The value is of the string type. The default value is L2.

    mode

    (Optional) The value can be train (default), infer, or test. Currently, only train is supported. Retain the default value.

    train_size

    Size of the training dataset. The value is less than the number of samples in the entire dataset. It is used to randomly sample some data for training when the dataset is read. The value is of the int type.

    If you read the dataset by yourself, perform sampling based on train_size to prevent the training speed from being too slow.

    The default value is 100000. The value must be greater than 0.

    epochs

    Number of training epochs. The value is of the int type. If the number of epochs is set to a large value, the training duration increases significantly. The default value is 30. The value must be greater than 0.

    train_batch_size

    Batch size during training. The default value is 8192 and the value is of the int type. The value must be greater than 0.

    infer_batch_size

    Batch size during inference. The default value is 128. The value is of the int type. The value must be greater than 0.

    learning_rate

    Learning rate. The default value is 0.0005. The value is of the float type. The value must be greater than 0.

    log_stride

    Training log printing interval (step). The default value is 500. The value is of the int type. The value must be greater than 0.

    construct_neighbors

    Range of the nearest neighbors obtained when a training dataset is constructed. This parameter is used to construct the special training dataset structure required for dimension reduction. The default value is 100. Change the value based on the number of objects in the dataset. The value is of the int type. The value must be greater than 0.

    queries_validation

    Number of query vectors required for constructing a validation dataset. The value is of the int type. The default value is 1000. The value must be greater than 0.

    --help | -h

    Help information.

  2. Generate an OM model.
    Before running the training script, run the following commands to set environment variables (change the path based on the actual installation path of the CANN package):
    1
    2
    source /usr/local/Ascend/ascend-toolkit/set_env.sh
    export LD_LIBRARY_PATH=/usr/local/Ascend/driver/lib64:/usr/local/Ascend/driver/lib64/common:/usr/local/Ascend/driver/lib64/driver:$LD_LIBRARY_PATH
    
    1. Generate an OM model with the precision of FP32.
      bash atc.sh {save_path} {om_name} {input_shape}
    2. Generate an OM model with the precision of FP16.
      bash atc_16.sh {save_path} {om_name} {input_shape}
    • {save_path}: path for storing the model. This parameter is mandatory. The file name in the path must end with .onnx or .pb. Otherwise, the script will obtain the values of environment variables such as framework and input_format, causing script execution exceptions.
    • {om_name}: name of the generated OM model. By default, the name is the same as that of the ONNX model. This parameter is optional.
    • {input_shape}: input shape of the ONNX model in the format of actual_input_1:infer_batch_size,dim. You are advised to use this default value. This parameter is optional.
    • bash atc.sh and bash atc_16.sh can be used only on Atlas inference product.