Model Adaptation

You need to make your model compatible, and you can add Rec SDK TensorFlow functions to your model during the adaptation process. This section describes some key steps in model adaptation and how to add desired functions.

You can use functions together by just modifying the corresponding key steps. To view the invoking process of a single function, see Function Training Process.

The functions of feature eviction and dynamic capacity expansion of the NPU's on-chip memory cannot be enabled at the same time.

The key steps are as follows:

  1. Initialize the framework.

    Initialize the Rec SDK TensorFlow model training framework by calling init.

    If you want to add a function, select the required function in this step and do as follows.

    Table 1 Features

    Feature

    Configuration Procedure

    Dynamic capacity expansion

    Set use_dynamic_expansion to True to enable dynamic capacity expansion of the NPU's on-chip memory. The default value of this parameter is False. The DDR and SSD modes support only dynamic capacity expansion of the memory or drive.

    Dynamic shape

    Set use_dynamic = True in the init API.

    Before enabling dynamic shape, install the Kernels operator package. For details, see "Installing Kernels" in "Installing CANN" in CANN Software Installation Guide.

    Automatic graph modification

    -

    Feature access and eviction

    -

  2. Define features or enable automatic graph modification.
    • Defining the feature list and model

      Use FeatureSpec to define the feature list and configure the corresponding model.

      If you want to add a function, select the required function in this step and do as follows.
      Table 2 Features

      Feature

      Configuration Procedure

      Dynamic capacity expansion

      -

      Dynamic shape

      -

      Feature access and eviction

      1. To enable the access function, set access_threshold to a value greater than or equal to 0 (unit: count). If access_threshold is set to a value less than -1, a parameter error is reported.
      2. To enable feature eviction, perform the following steps:
        1. Set eviction_threshold to a value greater than or equal to 0 (unit: second). If the threshold is less than -1, a parameter error is reported.
        2. Set index_key to FeatureSpec of timestamp and carry the is_timestamp=True parameter, indicating that the dataset contains a timestamp.
        3. Use the EvictHook API to set hook for the eviction triggering mode. This API contains three parameters: evict_enable=True, evict_time_interval=24 * 60 * 60, and evict_step_interval=10000, which respectively indicate the eviction function switch, eviction triggering interval (unit: second), and global step interval. Either evict_time_interval or evict_step_interval can be set.
      3. The feature eviction function hook is used only in training mode.
    • Automatic graph modification

      In NPUEstimator mode, you need to add GraphModifierHook of the automatic graph modification function to multiple NPUEstimator modes (train, predict, and train_and_evaluate). For example, if the current mode is train, add GraphModifierHook to the training hook to complete training in automatic graph modification mode.

      If you want to add a function, select the required function and do as follows.

      Table 3 Features

      Feature

      Configuration Procedure

      Dynamic capacity expansion

      -

      Dynamic shape

      -

      Feature access and eviction

      When using sparse_lookup, you need to set access_and_evict_config. The parameter type is dict consisting of two key-value pairs. The values of key are access_threshold and eviction_threshold, and value is the corresponding threshold.

  3. Define a dataset. Skip this step if you select automatic graph modification mode.

    Use FeatureSpec to define a feature list, create a dataset based on the feature list, preprocess the dataset, call the get_asc_insert_func API to obtain the data preprocessing API of Rec SDK TensorFlow, and apply the API to the dataset.

  4. Define an optimizer.

    Select an optimizer under mx_rec.optimizers and call the optimizer API to obtain the optimizer object at the sparse network layer. For details about the available optimizers, see Optimizers. The optimization API of the dense network layer can use the built-in optimizer of TensorFlow.

    If you want to add a function, select the required function in this step and do as follows.

    Table 4 Features

    Feature

    Configuration Procedure

    Dynamic capacity expansion

    Call the create_hash_optimizer_by_address API of the corresponding optimizer in the mx_rec.optimizers package to create a sparse_optimizer table to enable dynamic capacity expansion of the NPU's on-chip memory. The following lists the available optimizers:

    Dynamic shape

    -

    Automatic graph modification

    -

    Feature access and eviction

    -

  5. Create a sparse table.

    Create a sparse network layer by calling the create_table API. A sparse network layer can be created for each sparse feature.

    In Estimator mode, the create_table API must be called in model_fn passed to Estimator. The Estimator source code creates a graph instance when model_fn is called, but it is not the same as the default graph where the entry main function is located.

  6. Import the sparse network layer and feature list, create a model computational graph, and call the sparse_lookup API in the computational graph to query features and calculate errors.

    Table 5 Features

    Feature

    Configuration Procedure

    Dynamic capacity expansion

    -

    Dynamic shape

    -

    Automatic graph modification

    Query the sparse feature table. Call sparse_lookup and set modify_graph to True to enable the automatic graph modification mode during table query. The default value of this parameter is False.

    Feature access and eviction

    -

  7. Define the gradient calculation and optimization processes.

    Call get_dense_and_sparse_variable to obtain the parameters of the dense network layer and sparse network layer. Use the optimizer to calculate gradients and perform optimization.

    If you want to add a function, select the required function in this step and do as follows.

    Table 6 Features

    Feature

    Configuration Procedure

    Dynamic capacity expansion

    Dynamic capacity expansion of the on-chip memory.

    1. Obtain the embedding representation result (emb) and mapping address (addr).
      • Use the tf.get_collection("ASCEND_SPARSE_LOOKUP_LOCAL_EMB") API to obtain the embedding representation result for training.
      • Use the tf.get_collection("ASCEND_SPARSE_LOOKUP_ID_OFFSET") API to obtain the mapping address for training.
    2. Perform backward gradient calculations. Use the tf.gradients(loss, emb) API to calculate the derivation of the embedding representation result obtained in the previous step to obtain the gradient (grad).
    3. Perform backward sparse table update.

      Use the sparse optimizer to import the created sparse_optimizer.apply_gradients([grad, addr]) API to update the sparse table corresponding to the mapping address.

    Dynamic shape

    -

    Automatic graph modification

    -

    Feature access and eviction

    -

  8. Loads and preprocesses data. Skip this step if automatic graph modification is enabled.

    When FeatureSpec is used to define the feature list, call start_asc_pipeline to start data pipeline.