TorchTitan: Democratizing Large-Scale Distributed Training with PyTorch TorchTitan: Democratizing Large-Scale Distributed Training with PyTorch A comprehensive look at PyTorch's native solution for production-ready LLM pre-training Distributed training of large language m... AI Infrastructure Context Parallel Distributed Training Float8 FSDP2 Large Language Models Open Source Pipeline Parallel PyTorch Tensor Parallel torch.compile TorchTitan
Pruning LLMs With Regional Gradients: Inside Wanda++ Large language models are hard to deploy because memory and latency balloon with scale. In Findings of the Association for Computational Linguistics: ACL 2025, Yifan Yang and colleagues from the Unive... AWQ Quantization Fine Tuning Large Language Models LLaMA Model Compression Model Pruning OpenLLaMA Regional Gradients Semi-Structured Sparsity Sparsity TensorRT Wanda++