Universal Deep Research: A User-Programmable Deep Research Agent

NVIDIA's UDR Decouples Strategy From Models And Lets Users Bring Their Own Workflow

Universal Deep Research: Bring Your Own Model and Strategy

Peter Belcak Pavlo Molchanov

Get All The Latest to Your Inbox!

Advertise Here!

Gain premium exposure to our growing audience of professionals. Learn More

Inquire Now

Universal Deep Research (UDR) is a research prototype from NVIDIA Research that fundamentally rethinks how deep research agents work. Instead of hard-coding a fixed search-and-synthesis loop around one model, UDR allows users to describe their research strategy in plain English and automatically converts that strategy into executable code. The system then runs that code with live progress updates and produces a reproducible report.

Created by Peter Belcak and Pavlo Molchanov at NVIDIA, UDR treats the language model as a tool rather than the planner. The system is both model-agnostic and strategy-agnostic, meaning it can work with any LLM and execute any research workflow a user describes, from simple "search and synthesize" approaches to complex multi-iteration research plans.

Key Takeaways

Users write research strategies in natural language; UDR compiles them into executable, auditable code with structured progress tracking.

Model-agnostic design separates orchestration logic from LLM reasoning, allowing any compatible model to work with any strategy.

Deterministic execution uses function calls and variable storage instead of growing context windows, enabling reproducible runs.

Real-time transparency provides live notifications (like "search_started") that stream to the UI during execution.

Prototype interface includes strategy libraries, in-place editing, stop controls, and Markdown report viewing.

Overview

UDR transforms deep research from a black box into a transparent, programmable process. Users provide two inputs: a research question and a research strategy written in natural language. The system compiles the strategy into a single generator function that produces structured notifications and makes tool calls through a constrained API. The design makes two crucial choices that distinguish UDR from typical agentic systems.

First, it compiles the entire strategy upfront rather than planning step-by-step, which prevents common problems like skipped steps or strategy drift.

Second, it requires explicit mapping from each strategy step to commented code blocks, ensuring every part of the plan becomes auditable code.

During execution, UDR stores all intermediate results as named variables rather than accumulating them in the LLM's context window. The language model only handles focused tasks like ranking search results, summarizing content, or extracting information within specific steps. This CPU-first orchestration approach reduces costs, improves speed, and makes the entire process auditable.

The authors demonstrate three example strategies: Minimal (search, gather, synthesize), Expansive (branch across topics with multiple search phrases), and Intensive (iteratively refine searches based on previous findings). Each strategy defines its own logic for when to search, what to store, when to summarize, and how to build the final report.

Why This Matters

Current deep research tools typically fix the research methodology and only let users change their questions. This works well for general use but breaks down when domains require specific source preferences, validation rules, budget constraints, or compliance requirements. UDR flips this by making the research methodology itself programmable while keeping it expressed in natural language rather than code.

This programmability enables three major benefits. Teams can create standardized, auditable research playbooks that encode their preferred methods and risk tolerance. Enterprises can build compliance and domain logic directly into their research strategies while still choosing the best available model for each task.

The architecture also enables fair competition between models and research methods by decoupling them you can mix the strongest models with the most effective research strategies.

Discussion

Methodology and Validation

The paper presents UDR as a systems report rather than a traditional benchmark study. Instead of quantitative performance metrics, the authors provide qualitative evidence across three key areas.

First, they demonstrate that compiling entire strategies upfront improves reliability compared to step-by-step planning approaches that can drift or skip steps.

Second, they show efficiency gains from separating orchestration logic (handled by CPU) from LLM reasoning (targeted function calls). This architectural choice reduces both cost and latency while maintaining auditability.

Third, structured notifications provide real-time transparency that traditional black-box research tools lack.

The demonstration interface validates these claims by showing complete end-to-end workflows with live progress tracking, user stop controls, and comprehensive final report generation.

Figures and Interface Design

Figure 1 ⋃︀ A high-level diagram visualizing the components of a typical deep research tool. Unlike plain conversational LLMs, DRTs tend to continuously update the user on their progress before producing their report. Credit: Belcak and Pavlo Molchanov

The paper's figures illustrate UDR's core innovations effectively. Figure 1 establishes the baseline that UDR aims to improve: typical deep research tools that accept prompts, iterate through search and browsing cycles, then produce reports with limited visibility into the process.

Figure 2 ⋃︀ A high-level diagram visualizing the components of the UDR. Unlike specialized DRT, UDR receives. Credit: Belcak and Pavlo Molchanov

Figure 2 reveals UDR's key architectural departure. The system takes two distinct inputs, a research question and a research strategy, then compiles the strategy into constrained code with well-defined tool calls. This code executes in a secure sandbox while emitting structured notifications, creating a transparent and auditable research process.

Figures 3 and 4 showcase the practical interface that makes this transparency actionable. Users see a strategy editor for customizing research approaches, a live event feed displaying notifications like "prompt_received" and "search_started," stop controls with early report generation options, and a clean report viewer for final results.

Technical Architecture Benefits

UDR's design choices solve several persistent problems in agentic systems. The variable-based state management approach keeps all intermediate results reliably accessible by name, eliminating the context window bloat that causes many research agents to lose track of earlier findings as sessions progress.

Tool calls function as ordinary programmatic operations rather than model-mediated actions, making the entire system deterministic and significantly easier to debug when issues arise. This architectural decision transforms research agents from unpredictable black boxes into transparent, auditable workflows.

The authors emphasize that proper sandboxing becomes essential for production deployments. Since UDR generates and executes code dynamically, that code must run in complete isolation from the host system to prevent security vulnerabilities or unintended side effects.

Current Constraints and Trade-offs

UDR's effectiveness directly correlates with the quality and clarity of user-provided strategies. Ambiguous or poorly structured strategies tend to compile into brittle code that may fail or produce unexpected results. Similarly, vague research prompts can still cause errors during the LLM reasoning steps, even within UDR's constrained execution model.

The current prototype offers limited mid-execution interactivity. While users can stop execution and generate partial reports, they cannot dynamically adjust strategies or provide real-time guidance once a research workflow begins running. This represents a trade-off between deterministic reproducibility and adaptive flexibility.

Security depends entirely on robust sandboxing infrastructure. The system's ability to generate and execute arbitrary code makes isolation mandatory rather than optional, requiring careful deployment planning and ongoing security monitoring.

Positioning in the Research Tool Ecosystem

The authors strategically position UDR relative to existing research tools across two major categories. Consumer-focused tools like Perplexity and OpenAI's deep research typically employ predefined iterative browsing patterns that work well for general queries but lack customization for specialized domains or compliance requirements.

Enterprise solutions like NVIDIA's AI-Q Research Assistant and SambaNova's Deep Research favor more structured planning approaches over open-ended exploration, but they still operate within fixed architectural constraints that limit user control over research methodology.

UDR's fundamental innovation lies in making the research planning policy itself a first-class, user-authored artifact that compiles to executable code. This meta-level programmability distinguishes it from both consumer and enterprise alternatives by treating research methodology as malleable rather than fixed.

Similar Architectures and How They Differ

The landscape of research agents reveals several distinct architectural philosophies. STORM from Stanford uses a sophisticated two-phase approach: pre-writing research that simulates conversations between topic experts and moderators, followed by Wikipedia-style article generation. While powerful, STORM follows a fixed multi-perspective methodology rather than allowing custom strategies. The STORM paper (Shao et al., 2024) demonstrates this approach in detail, with the implementation available on GitHub. Co-STORM extends this with human-AI collaboration through turn management policies and shared mind maps.

GPT Researcher employs a classic planner-executor-publisher architecture. The system generates task-specific research questions, deploys crawler agents for parallel information gathering, then aggregates findings into comprehensive reports. However, it operates within a predefined plan-execute-publish pattern rather than supporting user-authored research workflows. The open-source project has gained significant traction with over 23,000 GitHub stars, demonstrating its effectiveness for general research automation.

Tavily takes a different approach entirely, positioning itself as infrastructure rather than an agent. It provides a web search API optimized specifically for LLM consumption, with reduced hallucinations and higher relevance than general search engines. While valuable, it focuses on the retrieval layer rather than research methodology. Tavily integrates well with LangChain and other AI frameworks, serving as a search backend for many research applications.

UDR's distinguishing characteristic is meta-level programmability. Unlike systems that provide powerful but fixed architectures, UDR treats the research methodology itself as a programmable artifact. Users don't simply configure parameters or choose between predefined strategies, they author entirely new research workflows in natural language that compile to deterministic code. This makes UDR less like a traditional research agent and more like a compiler for research workflows.

Conclusion

UDR demonstrates a practical approach to programmable research agents: restrict the model to focused reasoning tasks, express research plans in natural language, compile strategies to code once, and execute deterministically with real-time transparency.

For teams, this enables shareable research playbooks that are both reproducible and inspectable. For vendors, it decouples model capabilities from agent architecture.

The authors suggest several promising directions for future work: building comprehensive strategy libraries for common research domains, exploring direct user control over model reasoning processes, and investigating how to convert arbitrary prompts into deterministically controlled agents.

The approach appears particularly relevant for domains like finance, legal research, healthcare, and public administration where auditability, source governance, and cost control are paramount.

If you're evaluating agentic systems for production use, the UDR paper and codebase offer valuable insights into structuring tool calls, managing state, and implementing transparent progress tracking. The design principles translate well beyond research to any domain requiring auditable, programmable agent workflows.

Definitions

Deep Research Tool (DRT): An agent that converts user prompts into research plans, browses and retrieves information across multiple sources, and synthesizes structured reports with citations.

Strategy Compilation: UDR's core process of converting natural-language research plans into executable generator functions that yield structured notifications and call tools through a constrained API.

Expansive vs. Intensive Strategies: Research approaches that either branch broadly across topics and search phrases (expansive) or iterate deeply through multiple rounds, refining searches based on accumulated context (intensive).

Deterministic Orchestration: An architectural pattern where tool calls are implemented as ordinary functions and state is maintained in variables rather than LLM context, ensuring repeatable and auditable behavior.

Additional References:
Belcak & Molchanov, 2025; NVIDIA Research, 2025; Code Repository

NVlabs

Organization