NVIDIA Blackwell and Llama 4 Maverick: Ushering in a New Era of AI Inference Speed An NVIDIA AI system accomplished a record breaking 1,000+ tokens per second, per user, from a 400-billion-parameter language model all on a single machine. NVIDIA’s Blackwell architecture, paired with... AI inference Blackwell GPU acceleration Llama 4 NVIDIA speculative decoding TensorRT-LLM