init

Function

Initializes the Rec SDK model training framework.

Prototype

def init(**kwargs)

**kwargs Parameters

Parameter	Type	Mandatory/Optional	Description
max_steps	int	Optional	Total number of training steps. The default value is -1, indicating that the training ends after the training data is used up. The value ranges from –1 to 2147483647.
train_steps	int	Optional	Number of training steps for test prediction. The default value is -1, indicating that prediction is performed after all training datasets are trained. The value ranges from -1 to 2,147,483,647.
eval_steps	int	Optional	Number of test prediction steps. The default value is -1, indicating that the training continues after all test datasets are predicted. The value ranges from -1 to 2,147,483,647.
if_load	bool	Optional	Whether to load a model. The default value is False. Value: True: model loaded False: model not loaded
use_dynamic	bool	Optional	Whether to use the dynamic shape function. The default value is True. Value: True: dynamic shape enabled False: dynamic shape disabled
use_dynamic_expansion	bool	Optional	Whether to enable dynamic capacity expansion of the on-chip memory. The default value is False. Value: True: dynamic capacity expansion enabled False: dynamic capacity expansion disabled
bind_cpu	bool	Optional	Whether to enable automatic CPU core binding. The default value is True. Value: True: automatic CPU core binding enabled False: automatic CPU core binding disabled
save_steps	int	Optional	Saves data after save_steps is trained. The value ranges from –1 to 2147483647. The default value –1 indicates that all training data is saved after training.
save_checkpoint_due_time	int	Optional	Interval for saving the full model, in seconds. The value ranges from 1 to 2147483647. Generally, the value of save_checkpoint_due_time is greater than that of save_delta_checkpoints_secs. This parameter is mandatory when is_incremental_checkpoint is set to True. NOTE: When both incremental saving and loading and the SSD mode are enabled, if this parameter is set to a small value, data competition may occur, causing program segment errors.
save_delta_checkpoints_secs	int	Optional	Interval for saving the incremental model, in seconds. The value ranges from 1 to 2147483647. Generally, the value of save_checkpoint_due_time is greater than that of save_delta_checkpoints_secs. This parameter is mandatory when is_incremental_checkpoint is set to True. NOTE: When both incremental saving and loading and the SSD mode are enabled, if this parameter is set to a small value, data competition may occur, causing program segment errors.
is_incremental_checkpoint	bool	Optional	Whether to save and load the incremental model. The default value is False. True: enabled False: disabled
restore_model_version	int	Optional	Step of the model to be loaded. If this parameter is not passed, the latest model is loaded by default. If this parameter is set to a specific step, the model at the corresponding step is loaded. The value ranges from 0 to 2147483647
recent_key_count_threshold	int	Optional	Minimum number of key occurrences during the incremental saving period. This parameter is used for low-frequency filtering. When the incremental model is saved, the keys whose occurrence frequency is less than the value of this parameter are filtered out. The default value is 0. The value ranges from 0 to 2147483647
use_lccl	bool	Optional	When a multi-device job is running and the communication bandwidth usage is low, you can use the Low Latency Collective Communication Library (LCCL) function to accelerate collective communication. After this function is enabled, the following LCCL operators are enabled in some scenarios. Only the non-scale-out mode of the single-server on-chip memory is supported. For details about how to use this function, see LCCL Communication Optimization Operators and Samples. All2All operator GatherAll operator (fused Gather&AllToAll operator) GatherUss operator (fused Gather&UnsortedSegmentSum operator) The default value is False, indicating that this function is disabled.

When sess.run is used for training, the number of steps for sess to perform train/eval/save must be the same as the value of train_steps/eval_steps/save_steps.
When Estimator is used for training:
- The value of save_steps must be the same as that of save_checkpoints_steps when the NPURunConfig object is defined, and cannot be set to –1 in TensorFlow.
- The value of max_steps must be the same as that of max_steps passed to est.train()/tf.estimator.TrainSpec(), and cannot be set to –1 in TensorFlow.
- In train_and_evaluate mode, the requirements for save_steps and max_steps are the same as those described above. The value of train_steps must be the same as that of save_steps. The value of eval_steps must be the same as that of steps passed to tf.estimator.EvalSpec(), and cannot be set to –1 in TensorFlow.
If kwargs is used to pass other parameters that are not described, Rec SDK does not use these parameters.
Use the actual values of max_steps, train steps, and eval steps, and their values cannot be 0 at the same time.
If use_dynamic_expansion is set to True, select an optimizer of the ByAddr type, such as SGDByAddr and LazyAdamByAddress.
Multi-round evaluation is not supported in the train_and_evaluate scenario.
The values of max_steps, train_steps, eval_steps, and save_steps must be the same as those in the actual training process. If they are inconsistent, the training may fail or the training accuracy may be affected.

Return Value

Success: None
Failure: An exception is thrown.

Example

from mx_rec.util.initialize import init
init(max_steps=200, train_steps=100, eval_steps=10, save_steps=100, use_dynamic=True, use_dynamic_expansion=False)

init

Function

Prototype

**kwargs Parameters

Return Value

Example

See Also