执行以下操作进行专家系统分析:
. ${install_path}/ascend-toolkit/set_env.sh
${install_path}为Ascend-cann-toolkit开发套件包的指定安装路径。
msadvisor -d ${data_path}/ -c ${install_path}/ascend-toolkit/latest/tools/msadvisor/conf/train.json -p "parallel_optimization_model.device=2048;parallel_optimization_model.mini_batch_size=2048;parallel_optimization_model.model_mode=PanGu_13B"
仅需要微调train.json文件内的device、mini_batch_size、model_mode参数时使用。
msadvisor -d ${data_path}/ -c ${install_path}/ascend-toolkit/latest/tools/msadvisor/conf/train.json
需要详细配置模型参数时使用。执行命令前按需编辑train.json文件内的parameter参数。
-d参数作为优化建议和日志的输出路径。train.json文件内参数的详细介绍请参见train.json文件参数说明。
完成分析后,系统会将分析结果以打屏的形式展示并且保存分析结果。如图1所示。
分析结果会记录在${data_path}/recommendation/{timestamp}_{pid}.json文件中,供事后查询。
专家系统工具仅提供模型或算子的可优化项并给出优化建议,具体优化方式请开发者自行修改代码。
如下为train.json文件详细内容,所有参数配置值均为默认值,请根据实际情况配置各参数值。
{ "model_list": [ { "model_name": "parallel_optimization_model", "session_list": [ { # Python知识库路径,固定路径 "python_model_path": "../python/py_model/parallel_optimization_model/src", "parameter" :{ "device": 1024, # 集群场景中卡的总数 "mini_batch_size": 1024, # 单个epoch的数据大小 "model_mode": "PanGu_200B", # 训练使用的大模型名称,必须是“model_struc_configs”中存在模型名称 "profiling_para": { "CUBE_utilization_ratio": 0.85, # Cube利用率 "VECTOR_compute_bound_ratio": 0.5, # 处于计算瓶颈的Vector算子比例 "VECTOR_utilization_ratio": 0.6, # Vector利用率 "bandwidth_utilization_ratio": 0.7, # 带宽利用率 "CUBE_OPs_ratio": 0.9868, # Cube算子占比 "VECTOR_OPs_ratio": 0.0132 # Vector算子占比 }, "hyperpara": { "fp_bp_recomputation": 4, # 重计算参数 "fp_bp_linearity": 3, # 线性度参数 "AllReduce": "ringring", # AllReduce算法 "check_memory": false, # 是否检查内存 "mix_layer": false, # 是否为稠密、稀疏混合大模型 "hbm": 30, # 单卡显存 "CUBE_peak_performance": 262, # 理论Cube性能峰值 "VECTOR_peak_performance": 4, # 理论Vector性能峰值 "cluster": [0, 4, 8], # 训练时通信卡的数量 "bandwidth": [30, 120, 12.5] # 对应在cluster的[4卡间,8卡间,其他卡之间]的带宽值 }, # 大模型参数配置,每种大模型的相关参数键值都一样,但具体取值有所不同。目前默认的大模型有“PanGu_200B”,“PanGu_13B”,“PanGu_2_6B”,“PanGu_1_3B”,“GLAM_1_2T”,“GLAM_143B”,“Switch_C” "model_struc_configs":{ "PanGu_200B":{ "attention_head": 128, # 模型参数 "embedding_size": 16384, # 模型参数 "seq_len": 1024, # 模型参数 "num_layers": 64, # 模型参数 "capacity_factor": 2, # 模型参数 "num_experts": 1, # 模型参数 "expantion_ratio": 4, # 模型参数 "optimizer_parallel": true, # 是否开启优化器并行 "weight_byte": 2 # 数据精度{1:fp16, 2:fp32} }, "PanGu_13B":{ "attention_head": 40, "embedding_size": 5120, "seq_len": 1024, "num_layers": 40, "capacity_factor": 2, "num_experts": 1, "expantion_ratio": 4, "optimizer_parallel": true, "weight_byte": 2 }, "PanGu_2_6B":{ "attention_head": 32, "embedding_size": 2560, "seq_len": 1024, "num_layers": 32, "capacity_factor": 2, "num_experts": 1, "expantion_ratio": 4, "optimizer_parallel": true, "weight_byte": 2 }, "PanGu_1_3B":{ "attention_head": 32, "embedding_size": 2560, "seq_len": 1024, "num_layers": 16, "capacity_factor": 2, "num_experts": 1, "expantion_ratio": 4, "optimizer_parallel": true, "weight_byte": 2 }, "GLAM_1_2T":{ "attention_head": 128, "embedding_size": 8192, "seq_len": 1024, "num_layers": 64, "capacity_factor": 2, "num_experts": 64, "expantion_ratio": 4, "optimizer_parallel": true, "weight_byte": 2 }, "GLAM_143B":{ "attention_head": 128, "embedding_size": 4096, "seq_len": 1024, "num_layers": 32, "capacity_factor": 2, "num_experts": 64, "expantion_ratio": 4, "optimizer_parallel": true, "weight_byte": 2 }, "Switch_C":{ "attention_head": 32, "embedding_size": 2080, "seq_len": 1024, "num_layers": 30, "capacity_factor": 1, "num_experts": 2048, "hidden_size": 6144, "optimizer_parallel": true, "weight_byte": 2 } } } } ] } ] }
"model_struc_configs":{ "New_Model_Name":{ "attention_head": *, # 模型参数 "embedding_size": *, # 模型参数 "seq_len": *, # 模型参数 "num_layers": *, # 模型参数 "capacity_factor": *, # 模型参数 "num_experts": *, # 模型参数 "expantion_ratio": *, # 模型参数 "optimizer_parallel": *, # 是否开启优化器并行 "weight_byte": * # 数据精度{1:fp16, 2:fp32} } }