The web is evolving, and with it, the way we interact with information. Imagine a future where websites and applications understand and respond to your natural language, just like a conversation. This future is closer than you think, thanks to NLWeb, an open-source project from Microsoft. Much like HTML revolutionized document sharing and browsing, NLWeb aims to be the foundational layer for the emerging "AI Web", a new era where websites don't just display content but can converse and interact intelligently with both humans and AI agents.
NLWeb is designed to simplify the creation of conversational, natural language interfaces for any website or application. By embracing open protocols like MCP (Model Context Protocol) and leveraging the widely adopted Schema.org vocabulary, NLWeb is building the foundational layers for this "AI Web" where chatbots, AI assistants, and web content seamlessly interact through natural language.
Why NLWeb is Essential for the Future Web
NLWeb addresses key challenges in building intelligent web interfaces, offering compelling advantages:
- Effortless Conversational Interfaces: Easily integrate natural language capabilities into any website, leveraging familiar web standards. This democratizes access to advanced chatbot technology for businesses of all sizes.
- Open and Collaborative Ecosystem: Built on open protocols and tools, NLWeb fosters community collaboration and extensibility, ensuring it grows with the web. This also empowers web publishers to participate in the agentic web on their own terms, ensuring their content is ready to interact and be discovered by other AI agents.
- Intelligent Schema.org Integration: Leverage Schema.org (on 100M+ websites) to extract structured data for richer, more accurate, and contextual conversations. NLWeb enhances this structured data by incorporating external knowledge from underlying LLMs for an even richer user experience.
- Platform-Agnostic and Highly Extensible: Enjoy unparalleled flexibility. NLWeb supports multiple operating systems (Windows, macOS, Linux), various Large Language Models (LLMs) like OpenAI, DeepSeek, Gemini, Anthropic, and Inception, and popular vector databases such as Qdrant, Snowflake, Milvus, and Azure AI Search. Run it anywhere, connect it to anything, and scale it from a developer's laptop to enterprise-level data centers.
- Publisher Control and Cost Efficiency: Unlike external AI platforms that may scrape and centralize data, NLWeb enables publishers to create proprietary AI knowledge bases, retaining full control over their content and user data. It offers a cost-effective alternative to traditional search methods by leveraging existing data feeds (like RSS) and reducing the need for extensive programming.
How NLWeb Works: The Core Components
At its heart, NLWeb provides a robust framework through two primary components:
- A Simple, Open Protocol (MCP): Interact with any website or service using natural language via a straightforward REST API. Responses are structured using Schema.org vocabulary in JSON format. This enables AI assistants to query and interact with structured data through a uniform, open interface. Every NLWeb instance functions as both a conversational endpoint and an MCP server.
- A Comprehensive Reference Implementation: This includes powerful tools for ingesting structured web data, seamless connectivity to various LLMs and vector databases, and a fully functional web server with an intuitive conversational user interface.
Quick Start: Your First Steps with NLWeb
Ready to experience NLWeb? Here's how you can get a "Hello World" example running locally in minutes:
1. Clone the Repository
Begin by cloning the NLWeb repository from GitHub:
git clone https://github.com/microsoft/NLWeb.git
cd NLWeb
2. Set Up Your Python Environment
NLWeb is built with Python. Navigate to the code directory and install the necessary dependencies:
cd code
python -m venv myenv
source myenv/bin/activate # On Windows: myenv\Scripts\activate
pip install -r requirements.txt
3. Load Some Data into the Vector Store
Load RSS data into the vector database using the tools class. This loads the feed from "Behind the tech" podcast from the RSS feed..
python -m tools.db_load https://feeds.libsyn.com/121695/rss Behind-the-Tech
After starting the server, open your web browser and navigate to http://localhost:8000 (verify the exact port in your terminal output).
4. Start the NLWeb server
python app-file.py
5. Go to http://localhost:8000/
You should have a working search using natural language now! Watch as NLWeb processes your request, leverages Schema.org data as context, and returns structured, intelligent results.
Demo: Crawling and Chatting with Your Own Website
Beyond the quick start, NLWeb offers a powerful incremental website crawler. This tool can extract Schema.org markup from your site, generate semantic embeddings, and load this data into a vector database, preparing your content for sophisticated conversational AI interactions.
Example: Crawl a Website and Prepare for Chat
To demonstrate, from the code directory, run the following command:
python -m scraping.incrementalCrawlAndLoad example.com --max-pages 50
This command will execute the following steps:
- Crawl up to 50 pages of example.com.
- Extract all available Schema.org data.
- Generate high-quality vector embeddings from the extracted data.
- Store this processed data, making it ready for conversational queries.
Explore Your Data
Once the crawling is complete, you can easily configure the NLWeb service to point to your newly generated data directory. This allows you to immediately start chatting with and exploring the content of your own website through natural language.
The Power of Protocol and Integration
NLWeb's implementation of the Model Context Protocol (MCP) is fundamental to its interoperability, defining a universal way for AI assistants to query and interact with structured data.
REST API Example: Asking a Query
POST /ask
Content-Type: application/json
{
"query": "What are your most popular blog posts?",
"context": {}
}
Example Response: Structured Schema.org Data
{
"@type": "ItemList",
"itemListElement": [
{
"@type": "BlogPosting",
"headline": "AI and the Future of the Web",
"url": "https://example.com/ai-future"
},
// ... further results ...
]
}
This structured response empowers AI systems to accurately understand and utilize the information.
Customizing, Scaling, and Extending NLWeb
NLWeb is built for flexibility and growth:
- UI Customization: While a sample UI is provided, you have complete freedom to build your own custom user interface or seamlessly integrate NLWeb's conversational APIs into your existing applications.
- Robust Backend Integration: Connect NLWeb to live databases to keep your content fresh and dynamic. Its architecture allows for scaling from a developer's laptop to enterprise-level data centers.
- Enhanced Memory and Context: Configure NLWeb via straightforward YAML files to support extended conversation memory and adapt it for a wide range of use cases, from simple Q&A to complex interactive experiences.
- Flexible Cloud Deployment: Deploy NLWeb with ease across various cloud platforms, including Azure, Docker containers, and more. Detailed deployment guides are available in the /docs directory.
Additional Resources
To dive deeper into NLWeb, explore these valuable resources:
- Hello world on your laptop
- REST API Reference
- Crawling and Ingesting Data
- Contribution Guide
- Github Repository
NLWeb is an open-source project released under the permissive MIT License, encouraging widespread adoption and modification.
Get Involved!
Join us on the exciting journey to build the connected, conversational AI Web. Your contributions are welcome and vital. Visit the NLWeb repository on GitHub: microsoft/NLWeb. Explore the issues, engage in discussions, and review the contribution guidelines to become a part of this transformative project!
Introducing NLWeb: Natural Language Interfaces for the AI Web