Google DeepMind has introduced a preview of its latest AI model, Gemini 2.5 Computer Use. Built on top of Gemini 2.5 Pro, this specialized system is designed to let AI agents interact directly with graphical user interfaces (GUIs) — essentially performing tasks the same way a human would on a computer. Developers can now access it through the Gemini API via Google AI Studio and Vertex AI Studio.
How It Works
The model works by simulating user actions. It can:
-
Click buttons
-
Fill out forms
-
Scroll through pages
-
Navigate websites (even ones that require login)
It operates in a loop:
-
Takes a screenshot of the current screen.
-
Analyzes it and generates the next action (like a click or text input).
-
Repeats the process with an updated screenshot until the task is done.
This makes it capable of completing step-by-step workflows — much like a real user.
Performance and Use Cases
For now, the model is optimized for web browsers, with future potential for mobile apps. It isn’t yet designed for full desktop OS control.
On benchmarks like WebVoyager and AndroidWorld, which test web and mobile task automation, Gemini 2.5 Computer Use has shown:
-
Accuracy above 70%
-
Average latency of ~225 seconds
These numbers highlight its promise for automating structured, UI-driven tasks.
Built-In Safety and Developer Controls
Because this kind of AI carries risks, DeepMind has added safety layers. Developers can:
-
Restrict certain actions entirely
-
Require user approval for sensitive operations
-
Configure limits to reduce misuse or unexpected behavior
This way, the model balances automation power with safety and oversight.
Why It Matters
With Gemini 2.5 Computer Use, Google is bringing AI one step closer to acting as a true digital assistant — capable of navigating the web and apps in the same way humans do. If widely adopted, it could transform how developers build tools for task automation, productivity, and accessibility.