Databricks Unveils Distributed Machine Learning on Serverless and Standard Clusters

Unlocking Distributed Machine Learning for All

Get All The Latest to Your Inbox!

Advertise Here!

Gain premium exposure to our growing audience of professionals. Learn More

Inquire Now

Databricks is lowering the barriers to powerful machine learning (ML) by introducing distributed ML on both serverless and standard clusters, now in public preview. This advancement streamlines the scaling of ML workloads and reinforces security, offering a unified environment for experimentation, development, and production. Teams no longer need to be limited by dedicated clusters or grapple with fragmented tools, paving the way for broader innovation.

From Dedicated Clusters to Flexible Compute

Historically, distributed ML tasks like model training using Apache Spark MLlib or extensive hyperparameter tuning with Optuna, were restricted to dedicated clusters. These environments posed collaboration challenges and lacked necessary access controls for secure multi-user operations. The latest update from Databricks changes this paradigm by extending distributed ML capabilities to both serverless and standard clusters. Users now benefit from:

Seamless scaling of ML workloads without manual infrastructure management
Unified support for both single-node and distributed ML libraries
Enhanced security and governance for team-based projects

Expanded Machine Learning Capabilities

This release unlocks a spectrum of distributed ML workloads, empowering teams to:

Train distributed models with Apache Spark MLlib (Python)
Conduct large-scale hyperparameter tuning with Optuna
Manage experiments via MLflow Spark
Run distributed Scikit-learn, LightGBM, and XGBoost using Joblib Spark

By integrating these tools, Databricks offers a cohesive ML experience that supports everything from local prototyping to robust, production-scale deployments—all within a single compute environment.

Unified Compute and Robust Governance

Security and governance are foundational to this update. Powered by Lakeguard and Spark Connect, both serverless and standard clusters feature:

Unified compute experience: run ML, analytics, and ETL jobs together
Secure collaboration: multi-user isolation for concurrent workflows
Fine-grained access control (FGAC): user-level permissions, row filters, and column masking

These features, aligned with Spark 4 innovations, are deeply integrated into Databricks, ensuring that modern data teams can confidently scale and collaborate while maintaining strict governance standards.

Powering Innovation Through Open Source Collaboration

This achievement is built on extensive open source collaboration, especially with NVIDIA. By working together, Databricks and NVIDIA have expanded Spark ML capabilities via Spark Connect. GPU acceleration, available without code changes, can deliver up to 9x performance gains and reduce costs by up to 80%. These improvements set a new benchmark for scalable AI and ML workflows, making high-performance distributed ML more accessible than ever.

With these enhancements, enterprises of all sizes can now tap into efficient, cost-effective distributed ML, fostering greater insight and innovation from data at scale.

Getting Started with Distributed ML on Databricks

To leverage these new capabilities:

Serverless compute: Attach ML workloads to serverless clusters (version 4 or higher) for CPU or GPU (beta).
Standard clusters: Use Databricks Runtime 17.0 or higher.

Comprehensive resources are available for deeper exploration of Spark MLlib, Optuna, and best practices for secure governance using Unity Catalog across AWS, Azure, and GCP.

Conclusion

The public preview of distributed ML on Databricks represents a major leap forward, offering flexibility, security, and performance across all compute environments. By unifying the ML journey and eliminating infrastructure constraints, Databricks empowers teams to collaborate securely and accelerate innovation.

Source: Databricks Blog

in News

# Databricks data governance distributed ML Lakeguard machine learning Optuna serverless Spark MLlib

Source: https://www.databricks.com/blog/announcing-public-preview-distributed-ml-serverless-and-standard-clusters

Joshua Berkowitz November 25, 2025

Views 2915

Share this post

blogs

Our latest content

Check out what's new !

See all

Ads

Prompt Maker Image Generator

Struggling with the perfect AI image prompt? My free app helps you generate brilliant ideas and instantly creates an image to match. Go from concept to creation in two clicks!

Try It

Most Popular Articles

Check out what the hot topics are!