Thanks to Google DeepMind’s Gemini 2.5 Computer Use model there is a breakthrough that is transforming how digital workflows are automated across industries.
Gemini 2.5 Computer Use is now available in public preview through the Gemini API, Google AI Studio, and Vertex AI.
Built upon Gemini 2.5 Pro’s advanced visual reasoning, this specialized model enables developers to create agents that interact directly with user interfaces, breaking free from the traditional API limitations.
Unmatched Capabilities and Seamless Access
Its standout features include:
- Superior benchmark performance for browser and mobile controls, beating leading alternatives in both accuracy and responsiveness
- Low latency that supports real-time execution of complex UI tasks
- Developer-friendly APIs and demo environments to streamline prototyping and deployment
Inside the Model: How Gemini 2.5 Computer Use Works
This model operates using a loop-based workflow. Each cycle begins with user input, a screenshot of the current UI, and an action history. Gemini analyzes these to suggest the next UI action whether it’s clicking, typing, or dragging. For sensitive actions, such as purchases, the model can request user confirmation. After the action is executed, a fresh screenshot and URL are returned for the next step, continuing until the task is completed or halted.
- Broad UI manipulation support, handling everything from form completion to moving sticky notes between categories
- Optimized for web browsers, with strong performance on mobile; desktop OS-level control is not yet available
Leading the Pack in Benchmark Performance
Gemini 2.5 Computer Use sets the pace in industry benchmarks like Online-Mind2Web, WebVoyager, and AndroidWorld, excelling in both accuracy and speed. Tests by Browserbase and Google’s internal teams highlight its ability to deliver outstanding browser control with the lowest measured latency, making it suitable for robust automation and testing scenarios.
- Over 70% accuracy on complex tasks with industry-best low latency (around 225 seconds per task)
- Superior results in real-world cases, including automated form submissions and digital organization
Prioritizing Safety and Responsible AI
Google DeepMind recognizes the unique risks of giving AI agents control over digital environments. As a result, Gemini 2.5 Computer Use incorporates a suite of safety features:
- Per-step safety service that reviews each proposed action, reducing the risk of security issues or unintended consequences
- Customizable system instructions to require user approval for high-stakes actions or block risky behaviors
- Comprehensive documentation and best practices to equip developers in building safe, reliable interfaces
Prompt: “My art club brainstormed tasks ahead of our fair. The board is chaotic and I need your help organizing the tasks into some categories I created. Go to sticky-note-jam.web.app and ensure notes are clearly in the right sections. Drag them there if not.”
Real-World Adoption and Use Cases
Several Google teams are already leveraging this technology for production needs, especially in UI testing and workflow automation. Notable applications include Project Mariner, the Firebase Testing Agent, and Search’s AI Mode. Early users report dramatic improvements in automating complex, multi-step tasks and developing smarter personal assistants.
How to Get Started
Developers can immediately experiment with Gemini 2.5 Computer Use via Google AI Studio, Vertex AI, or through Browserbase’s demo. Extensive documentation and code samples are available to accelerate custom agent development. Google encourages the developer community to contribute feedback and drive the evolution of this promising technology.
Takeaway
Gemini 2.5 Computer Use marks a pivotal advancement in the pursuit of general-purpose AI agents capable of sophisticated, human-like interaction with computers. Its combination of high performance, robust safety, and developer accessibility opens up new possibilities for intelligent automation in every sector.
Source: Google Blog – Introducing the Gemini 2.5 Computer Use model
Gemini 2.5 Computer Use Model: Ushering in the Next Phase of AI-Powered UI Automation