Blog Posts | Joshua Berkowitz

6 Articles

2025 × benchmarking ×

Unsloth Dynamic GGUFs: How Extreme Model Compression Outperforms AI Giants

Compressing a large language model by 75% and still outperforming the latest releases from OpenAI and Anthropic is the promise of Unsloth Dynamic GGUFs. Their integration with the Aider Polyglot bench...

Aider Polyglot benchmarking DeepSeek LLMs model compression open-source AI quantization Unsloth

Dec 6, 2025

0 7480

Unleashing On-Device Agentic Power: How Fara-7B Transforms Human-Computer Interaction

Microsoft Research’s Fara-7B is a small, open-weight agentic model that interacts with your device in a human-like way. It looks to fulfil the promise ofhaving a digital assistant that doesn’t just un...

agentic AI AI safety benchmarking on-device AI open source small language models synthetic data web automation

Nov 25, 2025

0 9614

Toucan Dataset: Transforming AI Agents Into Digital Doers

Toucan, a groundbreaking open-source dataset from IBM and the University of Washington is crafted to propel tool-calling capabilities in large language models (LLMs) to new heights. For AI to move bey...

AI agents API integration benchmarking large language models machine learning open source tool-calling Toucan dataset

Oct 25, 2025

0 6171

OpenAI's GPT-5-Codex: The Next Evolution in AI-Powered Coding

OpenAI has taken a bold step forward in the AI coding space by introducing GPT-5-Codex. This new release redefines what developers can expect from AI-powered coding assistants, offering new levels of ...

AI coding benchmarking code review Codex GPT-5 OpenAI software development

Sep 16, 2025

0 20515

AssetOpsBench Sets New Standards for AI in Industrial Asset Management

Industrial asset management is undergoing a transformation as artificial intelligence agents are poised to take on complex tasks, from predictive maintenance to troubleshooting intricate machinery. At...

AI agents asset management benchmarking failure analysis industrial automation LLM evaluation multi-agent systems open source

Aug 29, 2025

0 8030

SciArena: Transforming How We Evaluate AI Models in Scientific Research

Researchers face a growing challenge: staying current with the ever-expanding body of scientific literature. Foundation models offer promise in helping synthesize and analyze this vast information, bu...

AI evaluation benchmarking crowdsourcing data quality foundation models leaderboard research tools scientific literature

Aug 5, 2025

0 5071

News

Our latest content

Check out what's new !

See all

Ads

Prompt Maker Image Generator

Struggling with the perfect AI image prompt? My free app helps you generate brilliant ideas and instantly creates an image to match. Go from concept to creation in two clicks!

Most Popular Articles

Check out what the hot topics are!

See all

Every shirt tells a story—and every story

#ClothingForACause