Blog

Technical writing on AI, infrastructure, and local model evaluation.

My OpenClaw Chronicles
My OpenClaw Chronicles

I built redundancy. It failed redundantly.

Four providers in the fallback chain. Nine cascade failures in one day. How two config files out of sync turned redundancy into a cardboard wall.

·7 min read
My OpenClaw Chronicles

What running local AI on a Mac Mini actually taught me: 7 things the tutorials, YouTube and ChatGPT all skipped

Seven infrastructure gotchas from running a persistent AI daemon on macOS — from silent sleep mode to corrupted eval data.

·9 min read
My OpenClaw Chronicles

My eval was silently giving every analysis task a failing score for weeks (and why)

112 consecutive failing runs on analyze tasks. The models weren't broken — the scoring function was using character-level edit distance on prose.

·7 min read
My OpenClaw Chronicles

The model I designed as my floor outperformed every candidate

How IBM's smallest Granite model — picked as the control floor — ended up as one of the strongest performers in a 38-run evaluation.

·5 min read
My OpenClaw Chronicles

The free TTS model that beats OpenAI

A round-trip TTS evaluation comparing sherpa-onnx VITS, macOS say, and OpenAI's TTS APIs. The free offline model scored highest.

·4 min read
My OpenClaw Chronicles

My OpenClaw chronicles — 958 shadow test runs later: what the data actually shows about local AI quality

958 scored runs across 38 model/task pairs, seven task types, a two-judge ensemble, and zero promoted models. Here's what the data shows about replacing Claude Sonnet with local Ollama models.

·8 min read