Skip to Content

Pathway: Real-Time Data Processing with Python and Rust

How a French Startup Is Redefining Stream Processing for the AI Era
Zuzanna Stamirowska Jan Chorowski Adrian Kosowski

Businesses demand real-time insights from streaming data and a groundbreaking framework is emerging from France that promises to change the entire process. Pathway represents a shift in how we approach stream processing, combining the simplicity of Python with the raw performance of Rust to deliver what its creators call "LiveAI™" systems.

pathwaycom

pathwaycom

Organization

pathway

Python ETL framework for stream processing, real-time analytics, LLM pipelines, and RAG.
44.1k
1.4k
50
Other
139.3k KB
1.4k Network
88 Subscribers
Python
batch-processingdata-analyticsdata-pipelinesdata-processingdataflow ...

Founded by CEO Zuzanna Stamirowska, CTO Jan Chorowski, and CSO Adrian Kosowski, a team of scientists and researchers with deep expertise in AI and data processing, Pathway addresses one of the most pressing challenges in modern data engineering: processing streaming data in real-time without sacrificing the familiar Python ecosystem that data scientists and engineers love.

The Problem: When Real-Time Isn't Really Real-Time

Traditional data processing frameworks force organizations into an uncomfortable choice. They can either work with familiar tools like Python and pandas for batch processing, accepting delays and missing real-time insights, or they can invest heavily in complex streaming platforms like Apache Flink or Kafka Streams that require specialized knowledge and often sacrifice the rich ecosystem of Python libraries that data teams depend on.

This dichotomy becomes particularly painful when building SOTA AI applications. Modern businesses need AI systems that can react to changing data in real-time, think recommendation engines that update immediately when user preferences shift, fraud detection systems that catch suspicious activity as it happens, or supply chain optimization that responds to disruptions within seconds. 

Yet most existing solutions require teams to maintain separate codebases for development and production, batch and streaming, or force compromises between performance and developer productivity.

The Solution: Unifying Batch and Stream with Differential Dataflow

Pathway elegantly solves this problem by providing a unified framework where the same Python code works seamlessly for both batch and streaming data processing. At its heart lies a powerful insight: instead of treating batch and streaming as fundamentally different paradigms, Pathway uses differential dataflow principles to maintain a consistent computation model regardless of whether data arrives all at once or flows continuously.

The framework's architecture is built on a scalable Rust engine that handles the heavy computational lifting while exposing a clean Python API that feels natural to data practitioners. This means developers can prototype with small datasets locally, test with batch data in CI/CD pipelines, and deploy the exact same code to handle live data streams in production all without changing a single line of application logic.

Why I Find Pathway Compelling

What immediately struck me about Pathway is how it removes the artificial barriers that have plagued the data processing world for years. The framework's approach to incremental computation is particularly interesting: when new data arrives, Pathway automatically determines what computations need to be updated rather than reprocessing everything from scratch. This differential approach not only improves performance but also ensures consistency when dealing with out-of-order or late-arriving data points.

The team's background also instills confidence. With Jan Chorowski's expertise in machine learning (he's worked extensively on neural networks and sequence modeling) and Adrian Kosowski's algorithmic research background, combined with Zuzanna Stamirowska's leadership in bringing academic research to production systems, Pathway feels like a rare combination of deep technical insight and practical business understanding.

Key Features That Set Pathway Apart

Pathway's feature set reveals a framework designed for the modern data stack. The wide range of connectors is impressive, from traditional sources like Kafka and PostgreSQL to modern cloud services like Google Drive and SharePoint. The Airbyte integration alone provides access to over 300 different data sources, making it easy to pull data from virtually any system.

The framework's handling of stateful operations is also noteworthy. Complex operations like temporal joins, windowing, and iterative algorithms that are challenging to implement correctly in traditional streaming systems become straightforward in Pathway. The framework automatically manages state persistence, allowing pipelines to resume exactly where they left off after updates or unexpected failures.

For AI applications, Pathway includes dedicated LLM tooling that makes building RAG (Retrieval-Augmented Generation) pipelines remarkably simple. The framework provides an in-memory vector index that updates in real-time as documents change, along with integrations for popular LLM frameworks like LangChain and LlamaIndex. This means AI applications can work with truly fresh data rather than the stale snapshots that plague many production AI systems.

Under the Hood: A Rust Engine with Python Ergonomics

The technical architecture of Pathway is where the framework truly shines. The core engine is implemented in Rust using Timely Dataflow, a computational model developed by Frank McSherry that enables efficient incremental computation over streaming data. This Rust foundation provides memory safety, fearless concurrency, and performance that can outpace traditional JVM-based streaming platforms.

Yet developers never need to write Rust code. The Python API is comprehensive and intuitive, supporting familiar patterns from pandas and NumPy while adding streaming-specific operations. Here's a simple example that demonstrates the framework's approach:

import pathway as pw

# Define the schema of your data
class InputSchema(pw.Schema):
    value: int
    timestamp: int

# Connect to your data stream
input_table = pw.io.kafka.read(
    rdkafka_settings,
    topic="sensor_data",
    schema=InputSchema
)

# Define transformations that work on both batch and streaming data
filtered_table = input_table.filter(input_table.value >= 0)
windowed_sums = filtered_table.windowby(
    input_table.timestamp,
    window=pw.temporal.sliding(duration=60000),  # 1 minute windows
    instance=input_table.value
).reduce(
    sum_value=pw.reducers.sum(filtered_table.value),
    count=pw.reducers.count()
)

# Output results
pw.io.jsonlines.write(windowed_sums, "output.jsonl")

# Run the computation
pw.run()

This code will work identically whether processing historical batch data or real-time streams. The pw.run() command starts the computation engine, which automatically handles threading, memory management, and incremental updates as data flows through the system.

The Rust engine source code reveals sophisticated optimizations for memory usage and computational efficiency. The framework uses techniques like operator fusion and vectorized operations to minimize overhead, while the differential dataflow model ensures that only necessary computations are performed when data changes.

Real-World Use Cases: From Logistics to AI

Pathway's versatility becomes obvious when examining its deployment across different industries. La Poste, the French postal service, uses Pathway to process IoT data from delivery vehicles and containers, reducing their IoT deployment costs by 50% while gaining real-time analytics capabilities. The framework processes location data, sensor readings, and operational metrics to optimize delivery routes and predict maintenance needs.

In the transportation sector, Transdev leverages Pathway to provide accurate real-time passenger information. The system processes GPS data from buses, traffic conditions, and passenger load information to calculate precise ETAs and detect route deviations. This real-time processing enables better passenger experiences and more efficient fleet management.

For AI applications, Pathway excels in scenarios requiring fresh, up-to-date knowledge. A customer support RAG system can instantly incorporate new documentation or policy changes, ensuring customer service representatives always have access to the latest information. Similarly, financial institutions use Pathway for real-time fraud detection, where models need to adapt immediately to new fraud patterns and account behaviors.

The framework's ability to handle complex event processing also makes it valuable for IoT applications. Manufacturing companies use Pathway to monitor equipment sensor data, detecting anomalies and predicting failures before they occur. The framework's temporal join capabilities allow correlating data from multiple sensors and external systems to build comprehensive views of operational health.

Community and Ecosystem: Building for Developers

Pathway's approach to community building reflects its commitment to developer experience. The extensive examples repository provides ready-to-run implementations for common use cases, from real-time ETL pipelines to log monitoring systems. These examples demonstrate best practices for building production-ready streaming applications.

The framework's template system deserves special mention. These templates provide complete, deployable applications for common scenarios like multimodal RAG with GPT-4o or private RAG with Ollama and Mistral. Each template includes Docker configurations, making it possible to deploy sophisticated real-time AI applications with just a few commands.

The contribution guidelines encourage community involvement while maintaining code quality. The project's structure, with clear separation between the Rust engine and Python bindings, makes it possible for contributors to focus on areas matching their expertise. The active Discord community provides a space for users to share experiences and get help from both other users and the core development team.

License and Usage Rights: BSL with Production Freedom

Pathway operates under a Business Source License 1.1 (BSL), a licensing model that balances open development with sustainable business practices. Under this license, Pathway is free for unlimited non-commercial use and most commercial purposes, including production deployments on single machines or VMs without resource limits circumvention.

The license includes specific restrictions around offering Pathway as a "Stream Data Processing Service" to third parties, which protects the company's ability to monetize enterprise offerings while keeping the framework accessible for most business use cases. Importantly, the code automatically converts to Apache 2.0 (fully open source) after four years, ensuring long-term availability regardless of the company's future.

For organizations requiring enterprise features like distributed computing, advanced monitoring, or "exactly once" processing guarantees, Pathway offers commercial licenses with additional capabilities. This dual licensing approach allows the framework to remain accessible to startups and individual developers while providing a path for enterprise adoption.

About NavAlgo: The Science Behind the Software

Pathway is developed by NavAlgo SAS, a company that embodies the transition from academic research to practical applications. The founding team's academic backgrounds are evident throughout the framework's design, from the principled approach to differential dataflow to the careful attention to correctness in concurrent systems.

The company has secured backing from notable investors including Lukasz Kaiser, co-inventor of the Transformer architecture, along with venture firms like TQ Ventures and Kadmos Capital. This combination of technical and financial backing provides confidence in the framework's long-term development and support.

NavAlgo's vision extends beyond just providing a data processing framework. The company is positioning itself at the forefront of what they call "LiveAI™", AI systems that operate on continuously fresh data. This vision is reflected in partnerships with major organizations like NATO, Intel, and various Fortune 500 companies that are using Pathway to build next-generation real-time AI applications.

Impact and Future Potential: Democratizing Real-Time AI

Pathway's potential impact on the data processing landscape is significant. By removing the traditional barriers between batch and streaming processing, the framework could democratize real-time analytics in the same way that frameworks like pandas democratized data analysis. Organizations that previously couldn't justify the complexity of real-time systems can now build streaming applications with familiar Python tools.

The framework's timing is particularly fortuitous. As AI applications become more prevalent, the need for systems that can work with fresh, streaming data becomes critical. Traditional approaches that periodically retrain models on batch data are inadequate for applications that need to respond to changing conditions in real-time. Pathway's approach of maintaining incrementally updated vector indices and enabling real-time model inference positions it well for this emerging market.

Looking ahead, the framework's modular architecture and strong foundation suggest promising directions for future development. The integration with emerging AI frameworks, support for more complex temporal reasoning, and potential for edge computing deployment all represent areas where Pathway could extend its impact.

Conclusion: A New Paradigm for Data Processing

Pathway embodies a fundamental shift toward treating real-time processing as the default rather than an exception. By combining the accessibility of Python with the performance of Rust and the theoretical foundations of differential dataflow, Pathway offers a compelling vision for the future of data processing.

For organizations struggling with the complexity of traditional streaming platforms or the limitations of batch processing, Pathway provides a third option that doesn't require compromising on either developer productivity or system performance. The framework's success with major enterprises like La Poste and Transdev demonstrates its readiness for production use, while its growing ecosystem of templates and examples makes it accessible to teams just starting their real-time data journey.

Whether you're building the next generation of AI applications, modernizing existing data pipelines, or simply curious about the future of stream processing, Pathway deserves your attention. In a world where data freshness increasingly determines competitive advantage, frameworks like Pathway aren't just nice to have—they're becoming essential.

Explore the repository, try one of the templates, or join the community to see how Pathway might transform your approach to real-time data processing.


Authors:
Zuzanna Stamirowska Jan Chorowski Adrian Kosowski
Pathway: Real-Time Data Processing with Python and Rust
Joshua Berkowitz October 3, 2025
Views 330
Share this post