Blog Posts | Joshua Berkowitz

2 Articles

TensorRT ×

Pruning LLMs With Regional Gradients: Inside Wanda++

Large language models are hard to deploy because memory and latency balloon with scale. In Findings of the Association for Computational Linguistics: ACL 2025, Yifan Yang and colleagues from the Unive...

AWQ Quantization Fine Tuning Large Language Models LLaMA Model Compression Model Pruning OpenLLaMA Regional Gradients Semi-Structured Sparsity Sparsity TensorRT Wanda++

Aug 29, 2025

0 4862

Papers

FP4 Quantization Meets NVIDIA HGX B200: A New Era of Efficient AI

AI technology is advancing at lightning speed, and the search for greater efficiency has led to a breakthrough: FP4 quantization . This 4-bit floating-point format, when combined with Lambda’s NVIDIA ...

AI acceleration deep learning FP4 Lambda Cloud model optimization NVIDIA B200 quantization TensorRT

Aug 2, 2025

0 7865

News

Our latest content

Check out what's new !

See all

Ads

Prompt Maker Image Generator

Struggling with the perfect AI image prompt? My free app helps you generate brilliant ideas and instantly creates an image to match. Go from concept to creation in two clicks!

Try It

Most Popular Articles

Check out what the hot topics are!

See all

Follow us

Our latest content

Prompt Maker Image Generator

Most Popular Articles

Every shirt tells a story—and every story

#ClothingForACause