DeepCode from the Data Intelligence Lab at The University of Hong Kong (HKUDS) is an open agentic coding system that turns research papers and natural language requirements into working code.
It aims to reduce the friction between ideas and production by coordinating multiple specialized agents to analyze documents, plan implementations, and generate runnable projects with tests and documentation.
At its core, the project addresses a familiar pain point: reproducing complex algorithms and scaffolding applications is slow, repetitive, and error-prone.
The Problem and the Solution
Researchers and engineers routinely spend days or weeks translating papers into code, wiring front ends, or shaping back ends that simply mirror requirements. DeepCode proposes a pragmatic solution: a multi-agent pipeline that ingests PDFs, URLs, or plain text and then orchestrates a sequence of steps spanning intent understanding, document parsing, architectural planning, reference mining, and code synthesis.
The result is not a single giant prompt but a set of coordinated capabilities, each agent optimized for a specific phase, with quality checks along the way. The repository documents this flow clearly and builds on a modern integration layer using the Model Context Protocol (MCP), making tools discoverable and swappable.
Key Features
- Paper2Code: automatically extracts algorithmic logic from research papers and converts it into structured implementations, with an emphasis on reproducibility (see tools/code_implementation_server.py and prompts/code_prompts.py).
- Text2Web: builds front-end code from plain descriptions, surfaced through a lightweight web app in ui/streamlit_app.py with components in ui/components.py.
- Text2Backend: scaffolds back-end services and APIs from requirements, with orchestration and validation handled by the agent pipeline and MCP tools.
- Smart Document Segmentation: optional segmentation for large documents to mitigate token limits and preserve semantics (see tools/document_segmentation_server.py and the toggle in mcp_agent.config.yaml).
Why It Stands Out
Two details make DeepCode especially practical. First, it treats tool integration as a first-class concern via MCP, so adding a search engine, filesystem access, or a domain-specific code indexer is a configuration exercise, not a fork.
Second, the project offers both a terminal-first workflow and a Streamlit web UI, lowering the barrier for different user types. The entry point in deepcode.py launches a polished Streamlit experience, while the CLI in cli/main_cli.py supports advanced terminal use and CI/CD automation.
Under the Hood
DeepCode is written in Python and packaged as deepcode-hku. The packaging is defined in setup.py and requirements.txt (notably: streamlit, PyPDF2, docling, anthropic, mcp-agent, mcp-server-git).
The web interface is a Streamlit app (Streamlit, 2025), launched by deepcode.py, which also performs basic environment checks. On the integration side, the repository embraces the Model Context Protocol for tool discovery and interoperability (MCP, 2025), configured via mcp_agent.config.yaml.
The MCP toolchain here is comprehensive: web search (Brave or an alt server), filesystem access, URL fetch, a GitHub downloader, a document downloader and converter, a command executor, a paper-to-code implementation server, a code reference indexer, and a segmentation server for long documents.
This makes workflows like research reproduction or rapid prototyping feel cohesive. The prompt library in prompts/code_prompts.py provides detailed, structured guidance for multi-step coding tasks, and the indexing utilities in tools/code_indexer.py and tools/code_reference_indexer.py support CodeRAG patterns. If you are new to RAG, see the foundational paper for context (Lewis et al., 2020).
Architecture-wise, the project exposes two faces: a GUI for fast iteration and demonstrations, and a CLI for scripted runs. The CLI implementation lives in cli/main_cli.py, with helper modules like cli/cli_interface.py and cli/cli_launcher.py.
The Streamlit layout code is organized in ui/layout.py, ui/components.py, and ui/styles.py. Together, they present a clear separation between orchestration, UI, and tool servers.
Use Cases
DeepCode is designed for reproducibility and rapid prototyping. Research labs can point it at a new paper to bootstrap an implementation and a minimal test suite, then refine the code offline.
Product teams can translate text requirements into a front end and back end skeleton to validate ideas quickly while educators can use the web interface to demonstrate algorithm translation from PDF to working code.
Because MCP servers can be swapped, organizations can route search to their preferred provider or attach internal document stores. The code indexers enable a bring-your-own-repo approach to CodeRAG, letting you bias the agents toward house style and libraries. This flexibility is valuable in regulated settings where the provenance of code and references matters.
Community and Contribution
The repository is under active development in the HKUDS organization, which also maintains related projects like LightRAG and RAG-Anything. Community links (Discord and WeChat) are surfaced in the README.md, and GitHub Issues and Pull Requests show ongoing iteration.
While a dedicated CONTRIBUTING.md is not present at the time of writing, the codebase is cleanly structured and easy to navigate, making external contributions feasible. The presence of a PyPI package simplifies installation for testers and early adopters.
Usage and License Terms
DeepCode is released under the MIT License. In short: you can use, copy, modify, merge, publish, distribute, sublicense, and sell copies of the software, provided you include the copyright and license notice in copies or substantial portions. There is no warranty; liability is disclaimed. See LICENSE for the exact terms.
Impact and Future Potential
Agentic coding has matured from experiments into pragmatic tooling. DeepCode sits in a compelling spot: practical integrations, a real UI, and a modular agent toolchain that maps well to everyday workflows.
Expect gains in turnaround time on paper implementations, more consistent scaffolds for features, and better reuse of known-good patterns via CodeRAG. The roadmap hints at stronger test generation and validation. Combined with MCP, this should make the system more portable across environments.
About the Lab
The HKUDS Data Intelligence Lab focuses on AI systems and tooling, with a portfolio that includes retrieval, agents, and applied ML infrastructure. Explore the organization page for people, research themes, and related repositories: HKUDS (HKUDS, 2025).
Conclusion
DeepCode is a focused, well-engineered attempt to turn research artifacts and natural language into real software. If you want to accelerate algorithm reproduction or quickly stand up working prototypes, this repository is worth your time. Start with the README, install the PyPI package, and try the web UI. For deeper integration, review the MCP configuration and connect the tools you already use.
DeepCode: An Open Agentic Coding System That Turns Papers Into Code