Dion Optimizer: Transforming Distributed AI Training Efficiency Optimizers such as Adam and AdamW have been essential to training large-scale neural networks. However, as model sizes soar into the trillions of parameters, the need for more efficient training metho... AI optimization deep learning distributed training large language models open source orthonormal updates PyTorch scalability