参考文献
本文档中关于大语言模型的精度调优经验主要来自于以下文献:
GLM-130B: AN OPEN BILINGUAL PRE-TRAINEDMODEL
OPT: Open Pre-trained Transformer Language Models
PaLM: Scaling Language Modeling with Pathways
OLMo: Accelerating the Science of Language Models
Train With Mixed Precision-NVIDIA DOCS HUB
OLMo-7B wandb training metrics
A Theory on Adam Instability in Large-Scale Machine Learning
父主题:
精度调试