Skip to Content

UI-TARS Desktop: A Native GUI Agent That Operates Your Computer With Language

Turn Natural Language into Real Computer Actions

Get All The Latest Research & News!

Thanks for registering!

UI-TARS Desktop: A Native GUI Agent That Operates Your Computer With Language

UI-TARS Desktop is an open source desktop application that lets you control a computer and browser using plain English instructions, powered by the UI-TARS vision-language model and the broader Agent TARS stack. 

It is designed to bridge the gap between large multimodal models and practical GUI automation, shipping local and remote operators with a focus on reliability, privacy, and developer ergonomics.

Setting the stage

Natural language computer use has moved from research demos to daily tooling. The challenge is less about model capability and more about wiring models to the real world: screen capture, element grounding, safe keyboard and mouse control, robust browser automation, and a developer surface to build on. 

UI-TARS Desktop tackles that problem with a thoughtfully engineered Electron app, first party operators, and a clear path from experiment to product.

Key features

  • Local and remote operators for computer and browser automation, with easy switching in the app UI.

  • Vision-first grounding plus DOM and hybrid browser control for speed and precision.

  • A native Electron UI with real time event streaming and status, suitable for demos and production pilots.

  • Privacy first local mode. Remote mode runs in sandboxed environments with time boxed sessions.

  • Developer facing SDKs and operators: nut.js for input control, browser operator, Browserbase, and ADB for Android experiments.

  • MCP integration so agents can call out to external tools through a standard protocol.

The problem and ByteDance's solution

The real world is messy. GUI layouts vary, state changes unexpectedly, and browser DOMs do not always reflect what users see on screen. 

UI-TARS Desktop adopts a vision-first approach with optional DOM and hybrid strategies, so the agent can ground instructions in pixels when needed, or speed up with DOM calls when safe. 

It ships both a local operator for on-device automation and a remote operator that runs tasks in managed sandboxes, giving teams a way to prototype locally and scale in the cloud when ready. 

See the project's Quick Start for the two modes.

Why it stands out

Three things impressed me in practice. First, the repository is a real monorepo with a clean app layer in apps/ui-tars and reusable packages under packages/ui-tars - a structure that makes contributions straightforward. 

Second, the model-operator contract is explicit via the SDK and action parser, which keeps prompts maintainable as the model family evolves. 

Third, the project embraces the Model Context Protocol (MCP) for tool mounting, which is quickly becoming a standard way to connect agents to external capabilities (Anthropic, 2024).

Under the hood

  • UI-TARS Desktop is TypeScript end to end. The Electron app lives in apps/ui-tars and is built with Vite and electron forge, tested with Vitest and Playwright. 

  • The operators and core libraries live under packages/ui-tars, including @ui-tars/sdk, @ui-tars/action-parser, @ui-tars/operator-nut-js, and the browser operator

  • The project uses pnpm workspaces, Turbo for orchestration, and a conventional lint, prettier, and changeset pipeline.

  • On the automation layer, the project leans on the excellent nut.js for cross platform mouse and keyboard control (Gutek, 2020) and Playwright for reliable end to end tests in an Electron context (Microsoft, 2020). 

  • For packaging, Electron Forge and electron updater handle multi platform bundles and updates. 

  • The codebase is Apache 2.0 and mostly TypeScript. See apps/ui-tars/package.json for a full dependency list.

Getting started

Most users will install a prebuilt release from the project's Releases page. 

For model setup, you can point the app to a Hugging Face Inference Endpoint for UI-TARS 1.5 or VolcEngine's Doubao 1.5 UI TARS variant. 

The app's Settings view in docs/setting.md explains the fields. Here is the minimal configuration shared in the docs:

Language: en
VLM Provider: Hugging Face for UI-TARS-1.5
VLM Base URL: https://your-endpoint/v1/
VLM API KEY: hf_xxx
VLM Model Name: your-model-name

For command line oriented workflows, the companion Agent TARS CLI provides a headless or Web UI mode that speaks the same event stream protocol:

npx @agent-tars/cli@latest
agent-tars --provider anthropic --model claude-3-7-sonnet-latest --apiKey YOUR_API_KEY

See the full quick start in docs/quick-start.md. Remote operators are available via the app for sandboxed runs; note the feature currently rolls out by region as documented.

Use cases

The repository's demos and issue tracker showcase practical tasks: flipping editor preferences, triaging GitHub issues, booking travel with browser automation, and generating charts with a tool mounted via MCP. 

Browse community examples in issue 842. The dual local and remote modes make it useful for prototyping internal workflows, QA automation with human in the loop, and research on grounding and recovery strategies.

In addition, UI-TARS Desktop has been used for tasks such as automating repetitive browser workflows, setting up development environments, and conducting visual regression tests. Community discussions highlight its utility in scenarios like debugging complex GUI layouts, orchestrating multi-step browser actions, and integrating with external APIs for dynamic task execution. The remote operator feature has proven particularly valuable for distributed teams needing secure and temporary access to shared environments.

Community and contributing

Contributions are welcome. Start with CONTRIBUTING.md and the monorepo structure. The team maintains active discussion channels linked from the README, including Discord. If you are integrating external tools, check the MCP guide and the CLI package for reference implementations.

License and usage

UI-TARS Desktop is licensed under Apache License 2.0. You can use, modify, and redistribute the software, including in commercial settings, provided you keep notices, include the license in redistributions, and note changes you have made. The license also grants a patent license from contributors and includes the standard warranty disclaimer and limitation of liability.

About the team

The project is maintained in the ByteDance (Chinese Org) open source org and is part of a broader Agent TARS initiative focused on connecting cutting edge multimodal models with production grade agent infrastructure. Learn more at agent-tars.com and explore related repositories such as the UI-TARS model.

Impact and outlook

By packaging a capable model with a first class desktop runtime, operator suite, and a clean developer surface, UI-TARS Desktop lowers the barrier to shipping real agent experiences. It slots into existing stacks via MCP, scales from local to remote execution, and stays approachable for both researchers and product teams. 

The path forward is exciting: tighter recovery loops, richer perception, and deeper tool ecosystems. Because the stack is open, you can experiment, fork, and extend without friction.

Closing thoughts

If you are exploring computer use agents, this repository deserves a careful look. Download a release, run a few tasks, then dive into apps/ui-tars and packages/ui-tars to see how it is built. Stars and thoughtful issues help the maintainers prioritize. Most of all, if you build something with it, share a demo. Open source moves fastest when we build in the open.


UI-TARS Desktop: A Native GUI Agent That Operates Your Computer With Language
Joshua Berkowitz August 13, 2025
Share this post
Tags
Sign in to leave a comment