ByteDance: UI-TARS 7B

Name: ByteDance: UI-TARS 7B
Price: 0.1 USD
Author: Bytedance

byBytedance

UI-TARS-1.5 is a multimodal vision-language agent optimized for GUI-based environments, including desktop interfaces, web browsers, mobile systems, and games. Built by ByteDance, it builds upon the UI-TARS framework with reinforcement learning-based reasoning, enabling robust action planning and execution across virtual interfaces. This model achieves state-of-the-art results on a range of interactive and grounding benchmarks, including OSworld, WebVoyager, AndroidWorld, and ScreenSpot. It also demonstrates perfect task completion across diverse Poki games and outperforms prior models in Minecraft agent tasks. UI-TARS-1.5 supports thought decomposition during inference and shows strong scaling across variants, with the 1.5 version notably exceeding the performance of earlier 72B and 7B checkpoints.

Pricing

Input

$0.10 / 1M tokens

Output

$0.20 / 1M tokens

Specifications

Context Window128K tokens

Max Output2K tokens

Modalitymultimodal

Input Typesimage, text

Output Typestext

Strategic Analysis 🔒

Unlock vCAIO insights to make better model decisions:

Governance Risk Rating (Low / Medium / High)
Quality Tier Classification
Best Use Cases & Tags
Strategic Verdict from vCAIO
AI-Verified Fit Scoring

Start Free Trial Sign In

Not sure if this model fits your use case?

Describe your task and get AI-verified recommendations in seconds.

Try Model Advisor

Other Bytedance Models

ByteDance Seed: Seed 1.6

Bytedance-Seed

$0.25/1M

ByteDance Seed: Seed 1.6 Flash

Bytedance-Seed

$0.07/1M

Pricing last updated: Invalid Date