With AMD's latest update deploying massive language models, up to 128 billion parameters, directly on your Windows laptop is now a possible. AMD’s Ryzen AI Max+ is a breakthrough that brings state-of-the-art AI capabilities to thin, light devices, eliminating the need for data center resources and offering unparalleled local processing power.
Deploying Massive Models Locally
At CES 2025, AMD introduced the first Windows AI PC processor that could run Meta’s Llama 70B model natively. With the new Ryzen AI Max+ 395 (128GB) and Adrenalin Edition™ 25.8.1 WHQL drivers, users now have access to 96GB of Variable Graphics Memory (VGM). This means you can run LLMs with up to 128 billion parameters locally using tools like LM Studio and llama.cpp.
Supporting the Latest AI Models
The Ryzen AI Max+ platform is the first to locally support Meta’s Llama 4 Scout 109B, with 17B parameters active at a time, and other cutting-edge models like Mistral Large. With robust vision and multi-context processing (MCP) support, capabilities once reserved for high-powered servers are now accessible on consumer laptops.
- Mixture-of-Experts (MoE) models, such as Llama 4 Scout, activate only a portion of their parameters (17B out of 109B), optimizing speed and memory use.
- Dense models require all parameters loaded simultaneously, but Ryzen AI Max+ 395 (128GB) handles these with ease, supporting flexible quantization via the GGUF format.
Quantization: Balancing Quality and Performance
Quantization is critical for managing model quality, memory usage, and performance. Ryzen AI Max+ supports models up to 16-bit precision, letting users fine-tune the balance between output quality and efficiency. While higher bit-depth can improve results, benefits tend to plateau beyond a certain threshold.
Expanding Context Windows for Advanced Workflows
A standout feature is the expanded context window. While LM Studio defaults to 4,096 tokens, the Ryzen AI Max+ 395 (128GB) with the latest drivers supports up to 256,000 tokens (using Flash Attention and KV Cache Q8). This is a game-changer for:
- Summarizing lengthy documents like SEC filings with up to 20,000 tokens in a single pass
- Processing and querying extensive research papers from ARXIV with sessions exceeding 21,000 tokens
Such generous context limits are essential for multi-context processing (MCP) and emerging agentic AI applications, where local LLMs can perform tool-calling and complex reasoning tasks.
Practical Use Cases and User Guidance
For most users, a 32,000-token context window with a compact model suffices for everyday MCP needs. However, advanced users running agentic workflows will benefit from the extended memory and context of the Ryzen AI Max+ 395 (128GB).
The MCP ecosystem is quickly evolving, with leading providers like Meta, Google, and Mistral developing LLMs optimized for tool integration and on-device inference, paving the way for personal, local AI assistants.
Device Availability and Security Considerations
The Ryzen AI Max+ 395 (128GB) is now shipping in devices from major manufacturers including ASUS, HP, Corsair, and Framework. As with any powerful AI tool, users should exercise caution—grant tool access only to trusted LLM implementations to protect security and privacy.
Readers can try out this capability, today, by downloading the preview driver and LM Studio:
Download AMD Adrenaline Preview Driver
Takeaway: Local AI Enters a New Era
AMD’s Ryzen AI Max+ upgrade is a game-changer, democratizing access to cutting-edge AI by enabling local execution of ultra-large models and sophisticated agentic workflows. This leap bridges cloud and client, empowering users to innovate and experiment with AI directly on their Windows PCs.
AMD Ryzen AI Max+ Upgrade: Powering 128B-Parameter LLMs Locally on Windows PCs