AMD Ryzen AI Max+ Upgrade: Powering 128B-Parameter LLMs Locally on Windows PCs

Rethinking Local AI: The Ryzen AI Max+ Leap

Get All The Latest to Your Inbox!

Advertise Here!

Gain premium exposure to our growing audience of professionals. Learn More

Inquire Now

With AMD's latest update deploying massive language models, up to 128 billion parameters, directly on your Windows laptop is now a possible. AMD’s Ryzen AI Max+ is a breakthrough that brings state-of-the-art AI capabilities to thin, light devices, eliminating the need for data center resources and offering unparalleled local processing power.

Deploying Massive Models Locally

At CES 2025, AMD introduced the first Windows AI PC processor that could run Meta’s Llama 70B model natively. With the new Ryzen AI Max+ 395 (128GB) and Adrenalin Edition™ 25.8.1 WHQL drivers, users now have access to 96GB of Variable Graphics Memory (VGM). This means you can run LLMs with up to 128 billion parameters locally using tools like LM Studio and llama.cpp.

Supporting the Latest AI Models

The Ryzen AI Max+ platform is the first to locally support Meta’s Llama 4 Scout 109B, with 17B parameters active at a time, and other cutting-edge models like Mistral Large. With robust vision and multi-context processing (MCP) support, capabilities once reserved for high-powered servers are now accessible on consumer laptops.

Mixture-of-Experts (MoE) models, such as Llama 4 Scout, activate only a portion of their parameters (17B out of 109B), optimizing speed and memory use.

Dense models require all parameters loaded simultaneously, but Ryzen AI Max+ 395 (128GB) handles these with ease, supporting flexible quantization via the GGUF format.

Quantization: Balancing Quality and Performance

Quantization is critical for managing model quality, memory usage, and performance. Ryzen AI Max+ supports models up to 16-bit precision, letting users fine-tune the balance between output quality and efficiency. While higher bit-depth can improve results, benefits tend to plateau beyond a certain threshold.

Expanding Context Windows for Advanced Workflows

A standout feature is the expanded context window. While LM Studio defaults to 4,096 tokens, the Ryzen AI Max+ 395 (128GB) with the latest drivers supports up to 256,000 tokens (using Flash Attention and KV Cache Q8). This is a game-changer for:

Summarizing lengthy documents like SEC filings with up to 20,000 tokens in a single pass

Processing and querying extensive research papers from ARXIV with sessions exceeding 21,000 tokens

Such generous context limits are essential for multi-context processing (MCP) and emerging agentic AI applications, where local LLMs can perform tool-calling and complex reasoning tasks.

Practical Use Cases and User Guidance

For most users, a 32,000-token context window with a compact model suffices for everyday MCP needs. However, advanced users running agentic workflows will benefit from the extended memory and context of the Ryzen AI Max+ 395 (128GB).

The MCP ecosystem is quickly evolving, with leading providers like Meta, Google, and Mistral developing LLMs optimized for tool integration and on-device inference, paving the way for personal, local AI assistants.

Device Availability and Security Considerations

The Ryzen AI Max+ 395 (128GB) is now shipping in devices from major manufacturers including ASUS, HP, Corsair, and Framework. As with any powerful AI tool, users should exercise caution—grant tool access only to trusted LLM implementations to protect security and privacy.

Readers can try out this capability, today, by downloading the preview driver and LM Studio:

Download LM Studio

Download AMD Adrenaline Preview Driver

Takeaway: Local AI Enters a New Era

AMD’s Ryzen AI Max+ upgrade is a game-changer, democratizing access to cutting-edge AI by enabling local execution of ultra-large models and sophisticated agentic workflows. This leap bridges cloud and client, empowering users to innovate and experiment with AI directly on their Windows PCs.

Source: AMD Blog

in News

# AMD context window large language models LLM deployment local AI quantization Ryzen AI Windows AI

Source: https://www.amd.com/en/blogs/2025/amd-ryzen-ai-max-upgraded-run-up-to-128-billion-parameter-llms-lm-studio.html

Joshua Berkowitz July 31, 2025

Views 26895

Share this post

blogs

Our latest content

Check out what's new !

See all

Ads

Prompt Maker Image Generator

Struggling with the perfect AI image prompt? My free app helps you generate brilliant ideas and instantly creates an image to match. Go from concept to creation in two clicks!

Try It

Most Popular Articles

Check out what the hot topics are!