Gemini 2.5 Computer Use Model: Ushering in the Next Phase of AI-Powered UI Automation

AI Agents That Interact Like Humans

Get All The Latest to Your Inbox!

Thanks to Google DeepMind’s Gemini 2.5 Computer Use model there is a breakthrough that is transforming how digital workflows are automated across industries.

Gemini 2.5 Computer Use is now available in public preview through the Gemini API, Google AI Studio, and Vertex AI.

Built upon Gemini 2.5 Pro’s advanced visual reasoning, this specialized model enables developers to create agents that interact directly with user interfaces, breaking free from the traditional API limitations.

Unmatched Capabilities and Seamless Access

Its standout features include:

Superior benchmark performance for browser and mobile controls, beating leading alternatives in both accuracy and responsiveness
Low latency that supports real-time execution of complex UI tasks
Developer-friendly APIs and demo environments to streamline prototyping and deployment

Inside the Model: How Gemini 2.5 Computer Use Works

This model operates using a loop-based workflow. Each cycle begins with user input, a screenshot of the current UI, and an action history. Gemini analyzes these to suggest the next UI action whether it’s clicking, typing, or dragging. For sensitive actions, such as purchases, the model can request user confirmation. After the action is executed, a fresh screenshot and URL are returned for the next step, continuing until the task is completed or halted.

Broad UI manipulation support, handling everything from form completion to moving sticky notes between categories
Optimized for web browsers, with strong performance on mobile; desktop OS-level control is not yet available

Leading the Pack in Benchmark Performance

Gemini 2.5 Computer Use sets the pace in industry benchmarks like Online-Mind2Web, WebVoyager, and AndroidWorld, excelling in both accuracy and speed. Tests by Browserbase and Google’s internal teams highlight its ability to deliver outstanding browser control with the lowest measured latency, making it suitable for robust automation and testing scenarios.

Over 70% accuracy on complex tasks with industry-best low latency (around 225 seconds per task)
Superior results in real-world cases, including automated form submissions and digital organization

Prioritizing Safety and Responsible AI

Google DeepMind recognizes the unique risks of giving AI agents control over digital environments. As a result, Gemini 2.5 Computer Use incorporates a suite of safety features:

Per-step safety service that reviews each proposed action, reducing the risk of security issues or unintended consequences
Customizable system instructions to require user approval for high-stakes actions or block risky behaviors
Comprehensive documentation and best practices to equip developers in building safe, reliable interfaces

Prompt: “My art club brainstormed tasks ahead of our fair. The board is chaotic and I need your help organizing the tasks into some categories I created. Go to sticky-note-jam.web.app and ensure notes are clearly in the right sections. Drag them there if not.”

Real-World Adoption and Use Cases

Several Google teams are already leveraging this technology for production needs, especially in UI testing and workflow automation. Notable applications include Project Mariner, the Firebase Testing Agent, and Search’s AI Mode. Early users report dramatic improvements in automating complex, multi-step tasks and developing smarter personal assistants.

How to Get Started

Developers can immediately experiment with Gemini 2.5 Computer Use via Google AI Studio, Vertex AI, or through Browserbase’s demo. Extensive documentation and code samples are available to accelerate custom agent development. Google encourages the developer community to contribute feedback and drive the evolution of this promising technology.

Takeaway

Gemini 2.5 Computer Use marks a pivotal advancement in the pursuit of general-purpose AI agents capable of sophisticated, human-like interaction with computers. Its combination of high performance, robust safety, and developer accessibility opens up new possibilities for intelligent automation in every sector.

Source: Google Blog – Introducing the Gemini 2.5 Computer Use model

in News

# AI agents API benchmarks Gemini 2.5 Google DeepMind safety UI automation workflow automation

Source: https://blog.google/technology/google-deepmind/gemini-computer-use-model/

Joshua Berkowitz October 7, 2025

Views 27995

Share this post

blogs

Our latest content

Check out what's new !

See all

Ads

Prompt Maker Image Generator

Struggling with the perfect AI image prompt? My free app helps you generate brilliant ideas and instantly creates an image to match. Go from concept to creation in two clicks!

Try It

Most Popular Articles

Check out what the hot topics are!