AI Systems & Optimization
AI optimization, local models, LLM systems, and infrastructure for production AI deployment.
Articles in this collection5
Awareness & EducationEducational content, guides, and introductions (TOFU)
AI Systems
Building AI systems: fine-tuning, optimization, inference engines, and multi-agent orchestration.
How a 4B Model Beat a 7B: Fine-Tuning Bake-off on 16GB VRAM
Qwen3-4B scored dead last in our base model bake-off. Then it beat our production Qwen2.5-7B after fine-tuning — 4.8/5 vs 3.0/5 on format compliance. The secret: VRAM headroom matters more than parameter count.
7B QLoRA Fine-Tuning on AMD RDNA4: The HQQ Path
bitsandbytes doesn't work on the RX 9070 XT. Here's how we trained five 7B models using HQQ's pure PyTorch quantization — and the six lessons (three patches, three gotchas) that made it possible.
Local Model Shootout: Finding the Right LLM for Every Task
Benchmarking 6 local models on an RX 9070 XT to build a three-model strategy — speed, code quality, and reasoning each get their own specialist.
Vulkan Beats ROCm: +20% LLM Inference on RDNA 4
How building llama.cpp from source revealed that RADV Vulkan outperforms ROCm HIP on the RX 9070 XT — and the pipeline parallelization that delivered a 44x speedup on top.