Introducing Test-Time Diffusion Deep Researcher (TTD-DR) framework an AI assistant that doesn't just gather information, but actively thinks, revises, and refines its work, much like a skilled human researcher. TTD-DR reimagines AI-powered research, making it more sophisticated, accurate, and efficient than ever before.
The novel Test-Time Diffusion Deep Researcher (TTD-DR) framework addresses these limitations by drawing inspiration from the iterative nature of human research and writing. Human writers typically establish a high-level plan, draft a report, and then engage in multiple revision cycles, often seeking external information to refine their arguments during revisions.
The TTD-DR conceptualizes research report generation as a diffusion process, akin to how diffusion models iteratively refine a noisy input into a high-quality output. In this way, TTD-DR starts with a preliminary, "noisy" draft (an updatable skeleton) and iteratively refines it through a "denoising" process to produce an accurate, human like report.
What Makes TTD-DR Different?
- Human-Inspired Iteration: TTD-DR treats research as a cycle, mirroring how people search, reason, and improve their drafts step by step.
- Draft-Centric Workflows: The process starts with a rough draft that evolves over several refinements, ensuring ideas are integrated smoothly and in context.
- Denoising via Retrieval: At each stage, TTD-DR brings in relevant external information to clarify and strengthen the draft.
- Self-Evolution Algorithms: Every part of the workflow—planning, query generation, synthesis, continuously self-improves for higher quality output.
- Proven Performance: TTD-DR outperforms current AI research agents, especially in tasks requiring deep, multi-step reasoning.
- Flexible Applications: Its design adapts seamlessly to fields like biomedicine, finance, and technology.
- Efficient Scaling: The framework delivers big performance gains with modest increases in processing time, making it practical for real-world use.
Inside the TTD-DR Framework
Traditional research agents rely on linear or parallel workflows, often struggling with complex, long-form content. TTD-DR changes the game by emulating human research habits: planning, drafting, seeking feedback, and iterating.
Figure 2 | Illustration of Test-Time Diffusion Deep Researcher (TTD-DR) framework, designed to mimic the iterative nature of human research through a draft. A user query initiates both a preliminary draft and a research plan. This evolving draft, along with the research plan, dynamically informs the generation of search questions and subsequent information retrieval to be timely and coherent, while reducing information loss. The retrieved information is then leveraged to denoise and refine the initial draft in a continuous feedback loop. The entire workflow is further optimized by a self-evolutionary algorithm to enhance the quality of the research plan, generated questions, answers, and the final report, demonstrating the synergistic power of diffusion and self-evolution in achieving superior research outcomes. Credit: Paper
Here’s how the process unfolds:
- Stage 1 – Planning: The agent lays out a structured research plan, setting the stage for targeted information gathering.
- Stage 2 – Iterative Search and Synthesis:
- 2a: Question Generation: The system creates search queries based on the evolving draft and initial plan.
- 2b: Answer Retrieval and Revision: Using Retrieval-Augmented Generation (RAG), TTD-DR pulls in external sources, revises its draft, and repeats improving with every cycle.
- Stage 3 – Final Synthesis: All findings are woven into a coherent, comprehensive report.
This iterative, feedback-driven method lets TTD-DR maintain context, reduce information loss, and deliver increasingly refined results.
The Power of Denoising and Self-Evolution
- Report-Level Denoising: The draft is constantly clarified and enriched with new, relevant information.
- Component-Wise Self-Evolution: Each workflow element generates multiple output candidates, selects the best, and merges them mimicking a peer review process.
These mechanisms fuel higher-quality, more diverse research outputs, with every component improving through iteration.
Figure 3 | A comparison of the TTD method with other open-source deep researchers. (a) Huggingface Open DR (Roucher et al., 2025) utilizes a lightweight planner to determine subsequent actions, such as calling search or browse tools, and repeats these actions until an answer is found. (b) GPT Researcher (Researcher, 2025) also employs a lightweight planner to generate and execute multiple search queries in parallel before a generator synthesizes the retrieved documents into a report. (c) Open Deep Research (Research, 2025) uses a planner to outline the final report’s structure and then conducts iterative research for each section individually before combining them. (d) TTD-DR introduces a draft denoising mechanism. Unlike Open Deep Research, TTD-DR avoids separated searches for each section to maintain global context and uses a RAG-based answer generator to process retrieved documents before saving them for the final report generation Credit: Paper
Why TTD-DR is So Powerful
By addressing the limitations of earlier agents, especially their lack of human-like drafting and revision, TTD-DR is set to transform how AI assists with research. Its iterative, context-aware workflow is critical for handling complex, multi-hop reasoning and long-form synthesis. This makes TTD-DR indispensable for professionals in data-rich, demanding fields like medicine, technology, and finance.
Moreover, TTD-DR’s self-improving processes ensure it not only solves problems, but also gets better at doing so over time. The result? A trustworthy, ever-advancing partner for challenging research tasks.
Table 1 | In this table, TTD-DR’s performances against different baseline systems for LongForm Research, DeepConsult, H LE and GA IA datasets. Win rate (%) are computed based on OpenAI Deep Research. Correctness is computed as matching between system predicted and reference answers. For Grok DeeperSearch on H LE -full, there is no public number provided, and we are not able to scrape the full 2K queries due to research budget and Grok DeeperSearch’s daily scrape limits. Credit: Paper
Benchmark Results: Outperforming the Field
Extensive testing shows TTD-DR excelling on benchmarks such as LongForm Research, DeepConsult, Humanity’s Last Exam (HLE), and GAIA. It consistently outperforms leading agents, including those from OpenAI and Perplexity, with win rates as high as 74.5% on select tasks.
Crucially, ablation studies confirm that self-evolution and denoising with retrieval are at the heart of these gains. TTD-DR generates more novel, complex queries and integrates information more efficiently than traditional approaches, making it both powerful and practical for deployment.
Setting a New Standard for AI Research
Test-Time Diffusion Deep Researcher marks a turning point for AI-powered research. By blending human-like iteration with advanced draft refinement and self-improving algorithms, it consistently achieves top-tier results on complex challenges. TTD-DR’s adaptable, robust architecture is already raising the bar for research automation across industries—and future enhancements promise even broader impact. For professionals seeking a reliable, high-performance research collaborator, TTD-DR is the new gold standard
Test-Time Diffusion Deep Researcher: Ushering in a Human-Like AI Research Paradigm
Deep Researcher with Test-Time Diffusion