Skip to Content

How Automated Prompt Optimization: Efficient Performance at a Fraction of the Cost

AI Performance Meets Practicality

Get All The Latest to Your Inbox!

Thanks for registering!

 

Advertise Here!

Gain premium exposure to our growing audience of professionals. Learn More

Enterprises striving to leverage AI for complex tasks often face a trade-off: high accuracy usually comes at a high cost, especially with leading proprietary models. Recent Databricks research reveals that automated prompt optimization can break this trade-off, helping organizations achieve top-tier accuracy with open-source models while dramatically reducing operating expenses.

Benchmarking with Real-World Complexity

Extracting structured information from unstructured documents is a persistent challenge in enterprise AI. It requires handling diverse schemas, specialized terminology, and ensuring reliability. To address this, Databricks introduced IE Bench; a benchmark suite designed for tough, domain-specific extraction tasks in finance, legal, healthcare, and more. IE Bench tests models in scenarios that mirror real business needs, providing a true measure of operational readiness.

The Power of Automated Prompt Optimization

Manual prompt engineering is time-consuming and doesn't scale for enterprise workloads. Automated prompt optimization replaces guesswork with algorithmic rigor, improving prompts through iterative, feedback-driven processes. Notable techniques like GEPA, SIMBA, and MIPROv2 systematically search for the best instructions or examples to maximize model accuracy with no labeled data or supervised fine-tuning required.

  • GEPA combines language reflection and evolutionary search, leading to the largest accuracy gains among optimizers tested.

  • These optimizations are pipeline-agnostic, supporting the multi-stage workflows common in enterprise AI.

  • Utilizing stronger optimizer models, such as Claude Sonnet 4, can further enhance the performance of open-source models like gpt-oss-120b.

Outperforming Proprietary Solutions at Scale

Applying GEPA optimization to gpt-oss-120b enabled it to surpass industry leaders like Claude Opus 4.1 and Claude Sonnet 4 on IE Bench, all while reducing serving costs by up to 90 times. Even proprietary models benefit, for example GEPA optimization pushed Claude Opus 4.1 to its best-ever performance, proving that automated prompt optimization enhances all model types.

  • GEPA-optimized gpt-oss-120b: Outperforms Claude Opus 4.1 by approximately 2.2 points at a fraction of the cost.

  • GEPA-optimized Claude Opus 4.1: Sets a new benchmark for IE Bench performance.

  • The quality-cost ratio improves markedly for all models post-optimization.

Prompt Optimization versus Supervised Fine-Tuning

While supervised fine-tuning (SFT) has long been the standard for boosting model quality, prompt optimization offers a cost-effective alternative. GEPA optimization alone matches or slightly exceeds SFT’s performance, cutting serving costs by about 20%. When the two approaches are combined, enterprises see even greater accuracy gains, though with a modest uptick in cost.

  • Prompt optimization delivers a superior quality-cost balance compared to SFT alone.

  • Combining GEPA and SFT maximizes accuracy, yet open-source models with prompt optimization still offer the lowest overall cost.

Scaling Up: Lifetime Cost Matters

For organizations running millions of AI-driven transactions, ongoing serving costs quickly outweigh the initial investment in optimization. GEPA-optimized gpt-oss-120b stands out, maintaining significantly lower total costs over time versus proprietary alternatives. This cost advantage persists even at massive scale, making it practical for high-volume, production-grade deployments.

  • The upfront optimization expense is rapidly recouped as usage grows.

  • Open-source models with automated prompt optimization deliver lasting savings at scale.

Key Takeaway

Databricks’ research underscores a pivotal shift: automated prompt optimization empowers enterprises to deploy high-performing, cost-efficient AI agents tailored to their real-world needs. Open-source models can now rival or outperform closed-source giants at a fraction of the price, and even proprietary solutions benefit from optimization. With these innovations integrated into Databricks Agent Bricks, organizations are equipped to quickly build, test, and optimize agents, unlocking unprecedented quality and efficiency for enterprise AI.

Source: Databricks Blog

How Automated Prompt Optimization: Efficient Performance at a Fraction of the Cost
Joshua Berkowitz December 6, 2025
Views 110
Share this post