Inside the Transformers v5 Release From HuggingFace Hugging Face's Transformers library just reached a pivotal moment with the v5.0.0rc0 release, its first major version upgrade in five years. With over 800 commits, this release introduces sweeping cha... api changes huggingface new models quantization release notes tokenization trainer transformers
TorchAO: A PyTorch-Native Shortcut To Smaller, Faster Models TorchAO is PyTorch's native toolkit for model efficiency: it unifies post-training quantization (PTQ), quantization-aware training (QAT), float8 (FP8) training, and structured sparsity in one coherent... deep learning FP8 model efficiency open source PyTorch QAT quantization sparsity TorchAO
How MXFP8, TorchAO, and TorchTitan Boost Large-Scale AI Training on Crusoe B200 Modern AI models are growing larger and more complex, demanding new solutions to speed up training without compromising accuracy. Recent experiments on the Crusoe B200 cluster , using 1,856 GPUs, show... AI acceleration Crusoe B200 float8 large-scale training MXFP8 PyTorch quantization TorchAO
BitNet: 1-bit LLMs Land With Practical Inference on CPUs and GPUs BitNet from Microsoft Research is the official C++ inference stack for native 1-bit large language models, centered on BitNet b1.58. The repo ships fast, lossless ternary kernels for CPUs, a CUDA W2A8... 1-bit LLM BitNet CPU GGUF GPU inference llama.cpp quantization T-MAC
AMD Ryzen AI Max+ Upgrade: Powering 128B-Parameter LLMs Locally on Windows PCs With AMD's latest update deploying massive language models, up to 128 billion parameters, directly on your Windows laptop is now a possible. AMD’s Ryzen AI Max+ is a breakthrough that brings state-of-... AMD context window large language models LLM deployment local AI quantization Ryzen AI Windows AI