Blog Posts | Joshua Berkowitz

4 Articles

inference ×

The Art of LLM System Design: Navigating Choices for Maximum Impact

In today’s fast-changing AI landscape, picking the right large language model (LLM) is both a challenge and a strategic imperative for business processes. With models continually emerging, each offeri...

AI strategy cost optimization enterprise AI inference LLM model selection open models system design

Dec 21, 2025

0 891

News

Databricks Delivers Fast, Scalable PEFT Model Serving for Enterprise AI

Enterprises aiming to deploy AI agents tailored to their proprietary data face the challange of delivering high-performance inference that can scale with complex, fragmented workloads. Parameter-Effic...

Databricks enterprise AI GPU optimization inference LoRA model serving PEFT quantization

Oct 28, 2025

0 4345

News

ONNX Runtime : Inference Runtime for Portability, Performance, and Scale

Deploying machine learning models efficiently is as important as training them. ONNX Runtime , an open-source accelerator from Microsoft, promises fast, portable inference across operating systems and...

deployment inference ONNX runtime TensorFlow Serving Triton

Sep 23, 2025

0 71016

Github Repos

BitNet: 1-bit LLMs Land With Practical Inference on CPUs and GPUs

BitNet from Microsoft Research is the official C++ inference stack for native 1-bit large language models, centered on BitNet b1.58. The repo ships fast, lossless ternary kernels for CPUs, a CUDA W2A8...

1-bit LLM BitNet CPU GGUF GPU inference llama.cpp quantization T-MAC

Aug 27, 2025

0 45859

Github Repos

Our latest content

Check out what's new !

See all

Ads

Prompt Maker Image Generator

Struggling with the perfect AI image prompt? My free app helps you generate brilliant ideas and instantly creates an image to match. Go from concept to creation in two clicks!

Most Popular Articles

Check out what the hot topics are!

See all

Every shirt tells a story—and every story

#ClothingForACause