How MXFP8, TorchAO, and TorchTitan Boost Large-Scale AI Training on Crusoe B200 Modern AI models are growing larger and more complex, demanding new solutions to speed up training without compromising accuracy. Recent experiments on the Crusoe B200 cluster , using 1,856 GPUs, show... AI acceleration Crusoe B200 float8 large-scale training MXFP8 PyTorch quantization TorchAO
Accelerating Transformers: GPT-OSS-Inspired Advances in Hugging Face Transformers are evolving fast and Hugging Face is leading the charge with new optimizations inspired by OpenAI's GPT-OSS models . If you're working with large language models, recent upgrades in the ... GPT-OSS Hugging Face model optimization NLP parallelism quantization transformers
Boosting Low-Precision AI: Fine-Tuning GPT-OSS with Quantization-Aware Training Deploying large language models requires balancing accuracy and efficiency , a challenge that intensifies as demand for high-throughput generative AI grows. The open-source gpt-oss model, featuring a ... AI deployment fine-tuning gpt-oss low precision model optimization NVIDIA QAT quantization
FP4 Quantization Meets NVIDIA HGX B200: A New Era of Efficient AI AI technology is advancing at lightning speed, and the search for greater efficiency has led to a breakthrough: FP4 quantization . This 4-bit floating-point format, when combined with Lambda’s NVIDIA ... AI acceleration deep learning FP4 Lambda Cloud model optimization NVIDIA B200 quantization TensorRT
AMD Ryzen AI Max+ Upgrade: Powering 128B-Parameter LLMs Locally on Windows PCs With AMD's latest update deploying massive language models, up to 128 billion parameters, directly on your Windows laptop is now a possible. AMD’s Ryzen AI Max+ is a breakthrough that brings state-of-... AMD context window large language models LLM deployment local AI quantization Ryzen AI Windows AI