Feature Access and Eviction
Usage Process
This part describes how to use feature access and eviction for training, involving the FeatureSpec mode and automatic graph modification mode.
After the eviction function is enabled, dynamic capacity expansion on the on-chip memory side is not supported.
Figure 1 Process of feature access and eviction
Key Steps
- In FeatureSpec mode, perform the configuration by referring to FeatureSpec.
- In automatic graph modification mode, refer to Automatic Graph Modification.
- The environment variable USE_COMBINE_FAAE controls whether to combine tables for statistics collection.
- The CPU operator set_threshold is provided to change the access threshold during training.
If the first input parameter value of set_threshold is 0, the corresponding embedding table does not accumulate features. (The access threshold remains unchanged, but the feature count is not accumulated and the historical value is used.)
Sample Code
- FeatureSpec mode:
1 2 3 4 5 6 7 8 9
feature_spec_list = [FeatureSpec("user_ids", feat_count=cfg.user_feat_cnt, table_name="user_table", access_threshold=access_threshold, eviction_threshold=eviction_threshold, faae_coefficient=1), FeatureSpec("item_ids", feat_count=cfg.item_feat_cnt, table_name="item_table", access_threshold=access_threshold, eviction_threshold=eviction_threshold, faae_coefficient=4), FeatureSpec("timestamp", is_timestamp=True)]
hook_evict = EvictHook(evict_enable=True, evict_time_interval=24*60*60, evict_step_interval=10000)
- Automatic graph modification mode:
config_for_user_table = dict(access_threshold=cfg.access_threshold, eviction_threshold=cfg.eviction_threshold, faae_coefficient=1)embedding = sparse_lookup(hash_table, feature, send_count, dim=None, is_train=is_train, access_and_evict_config=config_for_user_table , name=hash_table.table_name + "_lookup", modify_graph=modify_graph) hook_evict = EvictHook(evict_enable=True, evict_time_interval=24*60*60, evict_step_interval=10000) - Change the access threshold.
1 2 3 4 5 6 7
from mx_rec.util.ops import import_host_pipeline_ops thres_tensor = tf.constant(60, dtype=tf.int32) set_threshold_op = import_host_pipeline_ops().set_threshold(thres_tensor, emb_name=self.table_list[0].table_name, ids_name=self.table_list[0].table_name + "_lookup") with tf.Session() as sess: sess.run(set_threshold_op)
Parent topic: Function Training Process