Open Source

Personal AI Agents on AMD RDNA4

8 fine-tuned models running 50+ autonomous tasks daily on a single $550 consumer GPU. No cloud dependencies. No API costs. No NVIDIA required.

View on GitHub Read the blog post

Fine-tuned models

Training cost

50+

Daily tasks

~90 min

Total train time

The Problem

The standard QLoRA library (bitsandbytes) doesn't work on AMD's newest GPU architecture. Two independent failure paths, both unfixable without upstream changes: PyPI wheels crash with hipErrorNoBinaryForGpu, and building from source creates a ROCm version mismatch that produces undefined symbol: hsa_amd_memory_get_preferred_copy_engine.

Without 4-bit quantization, a 7B model in bf16 needs ~14GB just for weights — leaving nothing for activations and gradients on a 16GB card.

The Solution: HQQ

HQQ (Half-Quadratic Quantization) provides 4-bit quantization with a pure PyTorch backend — no custom CUDA/HIP kernels. Works on any device PyTorch supports, including RDNA4.

5.85 GB

Base model VRAM

12.2 GB

Peak training VRAM

~21 s/step

Training speed

Trained Models

Model

Task

Examples

VRAM

Eval Loss

judge-7b

Quality evaluation

14.5 GB

1.84

planner-8b

Goal planning

12.2 GB

1.35

seeder-7b

Research synthesis

12.2 GB

1.96

analyst-7b

Technical analysis

15.7 GB

1.32

reflector-3b

Session reflection

11.5 GB

—

deepener-1.5b

Topic exploration

515

8.0 GB

—

spacer-1.5b

Spaced repetition

107

8.0 GB

—

quizzer-1.5b

Quiz generation

120

8.0 GB

—

Training data generated by Sonnet 4.6 distillation on real vault data. Models quantized to Q8_0 GGUF and deployed to Ollama.

On these metrics: Eval loss measures next-token prediction on held-out examples. Lower is better, but it only tells us the model learned the output patterns — not whether it performs the task well. A model with good loss could still produce poorly calibrated scores or miss edge cases. We're building proper task-level evaluation: running each model alongside Sonnet on identical inputs and comparing output quality. Until that data is in, treat these as training diagnostics, not performance benchmarks.

The Autonomous System

These models don't sit idle. They run in a heartbeat system — an automated task scheduler that executes 50+ tasks daily on a 15-minute timer. The system self-improves: analysts find opportunities, planners create goals, implementers execute them, and judges evaluate the results.