Feature Overview

Rec SDK TensorFlow features include:

Core model training capabilities: It supports single-server single-device training, single-server multi-device, multi-server multi-device distributed training, and TensorFlow-based model development.
Recommendation-specific features: Based on the sparse table solution, the Rec SDK provides essential functions, such as feature saving and loading, feature access, and feature eviction, and non-affinity operator tiling.
Large-scale sparse table features: It supports multi-tier storage across accelerator card memory, host memory, and host drives, with support for dynamic capacity expansion. Storage scales can exceed 10 TB.
Customized training features: It supports custom WarmStart options, allowing parameter loading from multiple source domain models to facilitate continuous transfer learning.
Performance and accuracy diagnostic tools: The performance diagnostic tool supports host-side profiling, fusion of host and device-side profile data, latency ranking, and visualization to assist in performance bottleneck localization. The accuracy monitoring tool supports end-to-end, operator-level accuracy comparison, facilitating accuracy maintenance and troubleshooting.

Key Features

Rec SDK TensorFlow provides features such as dynamic capacity expansion, dynamic shapes, automatic graph modification, feature access and eviction, hot embedding, and custom WarmStart. You can integrate these features into their adapted models as needed.

Dynamic capacity expansion
TensorFlow supports embeddings through variables. You need to estimate the size of each table before creating variables through the API. Embedding table sizes are fixed at initialization and cannot be adjusted later, which may lead to wasted device memory or insufficient space. In recommendation scenarios, the size of multiple sparse tables is often unpredictable. To better meet user requirements, a dynamic capacity expansion feature for sparse tables has been added. On-chip memory supports both dynamic and static capacity expansion; with dynamic on-chip memory scaling, device memory usage grows during training. DDR/SSD modes only support dynamic scaling, where host memory/drive usage grows while device memory remains constant.

For process details, see Dynamic Capacity Expansion Mode of the on-chip memory.

Operator samples and README files for sparse tables are available through the link.

When dynamic capacity expansion is enabled, use compatible optimizers such as SGDByAddr, LazyAdamByAddress, or AdagradByAddress.
Dynamic shape
The Rec SDK TensorFlow framework supports dynamic shapes, where TensorFlow shapes depend on specific operations. Both operator inputs and outputs are treated as dynamic shapes.

For process details, see Dynamic Shape.
Automatic graph modification
Rec SDK TensorFlow supports two feature training modes: creating a FeatureSpec class or automatically modifying the TensorFlow computational graph.

Automatic graph modification modifies the TensorFlow graph so that training scripts do not require FeatureSpec creation or explicit calls to read embed key operator functions.

For process details, see Automatic Graph Modification.

Automatic graph modification only supports the default TensorFlow graph and does not currently support custom tf.Graph instances.
Feature access and eviction
Low-frequency features often do not improve training, causing memory waste and overfitting. Feature access filters these low-frequency features. Feature admission and eviction can be used in both FeatureSpec and automatic graph modification modes.

Features that do not contribute to training are evicted to maintain model performance and save memory. Rec SDK TensorFlow supports two eviction triggers: global step intervals or time intervals.

For process details, see Feature Access and Eviction.

Currently, you can enable access alone or both access and eviction together; enabling eviction alone is not supported.
Hot_Embedding
In recommendation scenarios with high key repetition rates, Hot_Embedding caches frequently access keys to accelerate table lookups.
Custom WarmStart
In TensorFlow Estimator mode, native WarmStart supports loading partial or full model parameters from a single path when training a new model. This provides a flexible way to restore parameters, commonly used in transfer learning to reuse layers or parameters from one task for another. The embedding table name does not support name mapping.

Custom WarmStart features:
- It is compatible with native TensorFlow WarmStart and supports WarmStart of specified sparse tables.
- It also supports loading full or partial parameters from multiple model paths for multi-model transfer learning.
Custom WarmStart is only supported in HBM and DDR modes under TensorFlow 1.15.0.
Incremental model saving and loading
This supports streaming training. Recommendation systems continuously generate log data for Click-Through Rate (CTR) models. The model is trained with the received data, and the full or incremental model is saved at a certain interval.

Saving only incremental updates for sparse parameters significantly reduces the overhead of frequent model checkpoints. This allows a model to be restored using the latest full checkpoint combined with a series of incremental checkpoints, reducing redundant computation.
- Incremental model saving and loading support only the on-chip memory mode, DDR mode, SSD mode, and Estimator mode in training and prediction modes, and do not support training and evaluation. Only capacity expansion and non-capacity expansion scenarios are supported.
- Incremental model saving and loading cannot be enabled together with feature access and eviction.
Multi-lookup for single tables
It consolidates multiple queries to a single sparse table into a single lookup operation.
PCIe through
It utilizes PCIe-through pipelined parallel swapping (in/out) and shared memory for data exchange, increasing throughput between the host and device.

Parent topic: Introduction