Today’s organizations rely on real-time insights to stay competitive. Apache Spark Structured Streaming has introduced a groundbreaking real-time mode, empowering data teams to process streaming events in milliseconds. This advancement supports a new class of ultra-low-latency applications without requiring code rewrites or platform migrations.
Real-Time Mode: A Paradigm Shift
Real-time mode fundamentally changes how Spark Structured Streaming operates. Traditional micro-batch triggers process data at fixed intervals, creating delays. In contrast, real-time mode processes and emits results continuously as soon as new events arrive. This innovation enables p99 latencies in the single-digit millisecond range, perfect for time-sensitive use cases.
- Continuous execution: Data is processed instantly, removing micro-batch wait times.
- Effortless adoption: Enable real-time mode with a simple configuration with no code changes or replatforming.
- Open source and live: Real-time mode is open source, available in Apache Spark, and public previewed on Databricks with broad streaming support.
How It Works
Real-time mode initiates long-lived streaming jobs with concurrent stage scheduling and in-memory data shuffling using a specialized streaming shuffle. This approach:
- Minimizes coordination overhead between Spark tasks
- Eliminates delays from micro-batch scheduling
- Achieves consistent millisecond performance
Internal benchmarks from Databricks show p99 latencies from a few milliseconds to around 300 ms, depending on the complexity of data transformations.
Transformative Real-Time Use Cases
Real-time mode is designed for scenarios where response speed is critical to business outcomes. Early adopters are already realizing its impact:
- Fraud Detection: Major banks analyze credit card transactions from Kafka, flagging suspicious activity within 200 ms, reducing risk without infrastructure overhaul.
- Live Personalization: Streaming and e-commerce platforms deliver instant recommendations and offers, boosting user engagement with real-time feedback.
- Session State and Search: Travel websites present up-to-date search histories and tailored results across devices, enhancing user experience and retention.
- ML Feature Serving: Food delivery apps update driver locations in milliseconds, feeding machine learning models for precise ETAs.
Other promising applications include IoT sensor monitoring, supply chain visibility, live gaming telemetry, and in-app personalization, essentially any setting where immediate data-driven action creates business value.
Simple Onboarding for Spark Users
If you already use Spark Structured Streaming, activating real-time mode is straightforward. Update your cluster configuration and query trigger and there is no need to refactor code. On Databricks (DBR 16.4+), just:
- Launch a cluster (preferably Dedicated Mode) with Public Preview access
- Set the appropriate Spark configuration for real-time mode
- Apply the new RealTimeTrigger to your streaming queries
Checkpoint intervals are tunable (default: five minutes). While more frequent checkpoints may slightly increase latency, most major streaming sources (Kafka, Kinesis) and sinks are already supported, with more integrations coming soon.
Is Real-Time Mode Right for You?
Deploy real-time mode for pipelines where low latency directly translates to business value. For less time-sensitive analytics, traditional micro-batching remains a cost-effective alternative. As with any cutting-edge low-latency technology, real-time mode introduces some extra overhead, so reserve it for your most latency-critical workloads.
The Road Ahead
Databricks continues to expand compatibility and push performance boundaries. For technical details, implementation guidance, and the latest updates, see the official documentation. To learn more about Spark 4.0’s direction, watch Michael Armbrust’s keynote and technical deep dive from DAIS 2025.
Key Takeaway
Apache Spark’s real-time mode empowers organizations to build applications that demand immediate data-to-decision pipelines. With seamless configuration, broad support, and proven millisecond latency, it’s set to be a cornerstone for the next generation of real-time analytics and AI-powered products.
Source: Databricks Blog

How Apache Spark’s Real-Time Mode Delivers Millisecond Latency for Streaming Applications