DeepCode: An Open Agentic Coding System That Turns Papers Into Code

Inside HKU’s multi-agent engine for Paper2Code, Text2Web, and Text2Backend

HKUDS

Get All The Latest to Your Inbox!

Advertise Here!

Gain premium exposure to our growing audience of professionals. Learn More

Inquire Now

DeepCode from the Data Intelligence Lab at The University of Hong Kong (HKUDS) is an open agentic coding system that turns research papers and natural language requirements into working code.

It aims to reduce the friction between ideas and production by coordinating multiple specialized agents to analyze documents, plan implementations, and generate runnable projects with tests and documentation.

At its core, the project addresses a familiar pain point: reproducing complex algorithms and scaffolding applications is slow, repetitive, and error-prone.

HKUDS

Organization

DeepCode

"DeepCode: Open Agentic Coding (Paper2Code & Text2Web & Text2Backend)"

agentic-codingllm-agent

The Problem and the Solution

Researchers and engineers routinely spend days or weeks translating papers into code, wiring front ends, or shaping back ends that simply mirror requirements. DeepCode proposes a pragmatic solution: a multi-agent pipeline that ingests PDFs, URLs, or plain text and then orchestrates a sequence of steps spanning intent understanding, document parsing, architectural planning, reference mining, and code synthesis.

The result is not a single giant prompt but a set of coordinated capabilities, each agent optimized for a specific phase, with quality checks along the way. The repository documents this flow clearly and builds on a modern integration layer using the Model Context Protocol (MCP), making tools discoverable and swappable.

Key Features

Paper2Code: automatically extracts algorithmic logic from research papers and converts it into structured implementations, with an emphasis on reproducibility (see tools/code_implementation_server.py and prompts/code_prompts.py).

Text2Web: builds front-end code from plain descriptions, surfaced through a lightweight web app in ui/streamlit_app.py with components in ui/components.py.

Text2Backend: scaffolds back-end services and APIs from requirements, with orchestration and validation handled by the agent pipeline and MCP tools.

Smart Document Segmentation: optional segmentation for large documents to mitigate token limits and preserve semantics (see tools/document_segmentation_server.py and the toggle in mcp_agent.config.yaml).

Why It Stands Out

Two details make DeepCode especially practical. First, it treats tool integration as a first-class concern via MCP, so adding a search engine, filesystem access, or a domain-specific code indexer is a configuration exercise, not a fork.

Second, the project offers both a terminal-first workflow and a Streamlit web UI, lowering the barrier for different user types. The entry point in deepcode.py launches a polished Streamlit experience, while the CLI in cli/main_cli.py supports advanced terminal use and CI/CD automation.

Under the Hood

DeepCode is written in Python and packaged as deepcode-hku. The packaging is defined in setup.py and requirements.txt (notably: streamlit, PyPDF2, docling, anthropic, mcp-agent, mcp-server-git).

The web interface is a Streamlit app (Streamlit, 2025), launched by deepcode.py, which also performs basic environment checks. On the integration side, the repository embraces the Model Context Protocol for tool discovery and interoperability (MCP, 2025), configured via mcp_agent.config.yaml.

The MCP toolchain here is comprehensive: web search (Brave or an alt server), filesystem access, URL fetch, a GitHub downloader, a document downloader and converter, a command executor, a paper-to-code implementation server, a code reference indexer, and a segmentation server for long documents.

This makes workflows like research reproduction or rapid prototyping feel cohesive. The prompt library in prompts/code_prompts.py provides detailed, structured guidance for multi-step coding tasks, and the indexing utilities in tools/code_indexer.py and tools/code_reference_indexer.py support CodeRAG patterns. If you are new to RAG, see the foundational paper for context (Lewis et al., 2020).

Architecture-wise, the project exposes two faces: a GUI for fast iteration and demonstrations, and a CLI for scripted runs. The CLI implementation lives in cli/main_cli.py, with helper modules like cli/cli_interface.py and cli/cli_launcher.py.

The Streamlit layout code is organized in ui/layout.py, ui/components.py, and ui/styles.py. Together, they present a clear separation between orchestration, UI, and tool servers.

Use Cases

DeepCode is designed for reproducibility and rapid prototyping. Research labs can point it at a new paper to bootstrap an implementation and a minimal test suite, then refine the code offline.

Product teams can translate text requirements into a front end and back end skeleton to validate ideas quickly while educators can use the web interface to demonstrate algorithm translation from PDF to working code.

Because MCP servers can be swapped, organizations can route search to their preferred provider or attach internal document stores. The code indexers enable a bring-your-own-repo approach to CodeRAG, letting you bias the agents toward house style and libraries. This flexibility is valuable in regulated settings where the provenance of code and references matters.

Community and Contribution

The repository is under active development in the HKUDS organization, which also maintains related projects like LightRAG and RAG-Anything. Community links (Discord and WeChat) are surfaced in the README.md, and GitHub Issues and Pull Requests show ongoing iteration.

While a dedicated CONTRIBUTING.md is not present at the time of writing, the codebase is cleanly structured and easy to navigate, making external contributions feasible. The presence of a PyPI package simplifies installation for testers and early adopters.

Usage and License Terms

DeepCode is released under the MIT License. In short: you can use, copy, modify, merge, publish, distribute, sublicense, and sell copies of the software, provided you include the copyright and license notice in copies or substantial portions. There is no warranty; liability is disclaimed. See LICENSE for the exact terms.

Impact and Future Potential

Agentic coding has matured from experiments into pragmatic tooling. DeepCode sits in a compelling spot: practical integrations, a real UI, and a modular agent toolchain that maps well to everyday workflows.

Expect gains in turnaround time on paper implementations, more consistent scaffolds for features, and better reuse of known-good patterns via CodeRAG. The roadmap hints at stronger test generation and validation. Combined with MCP, this should make the system more portable across environments.

About the Lab

The HKUDS Data Intelligence Lab focuses on AI systems and tooling, with a portfolio that includes retrieval, agents, and applied ML infrastructure. Explore the organization page for people, research themes, and related repositories: HKUDS (HKUDS, 2025).

Conclusion

DeepCode is a focused, well-engineered attempt to turn research artifacts and natural language into real software. If you want to accelerate algorithm reproduction or quickly stand up working prototypes, this repository is worth your time. Start with the README, install the PyPI package, and try the web UI. For deeper integration, review the MCP configuration and connect the tools you already use.

in Github Repos

# agents code-generation HKU MCP Paper2Code Python RAG research-reproduction Streamlit

Authors:

HKUDS

Joshua Berkowitz September 3, 2025

Views 27841

Share this post

blogs

Our latest content

Check out what's new !

See all

Ads

Prompt Maker Image Generator

Struggling with the perfect AI image prompt? My free app helps you generate brilliant ideas and instantly creates an image to match. Go from concept to creation in two clicks!

Try It

Most Popular Articles

Check out what the hot topics are!

See all

Follow us

DeepCode: An Open Agentic Coding System That Turns Papers Into Code

Get All The Latest to Your Inbox!

Advertise Here!

Inquire Now

HKUDS

DeepCode

The Problem and the Solution

Key Features

Why It Stands Out

Under the Hood

Use Cases

Community and Contribution

Usage and License Terms

Impact and Future Potential

About the Lab

Conclusion

Share this post

Tags

blogs

Our latest content

Prompt Maker Image Generator

Most Popular Articles

Every shirt tells a story—and every story

#ClothingForACause