Blog
Technical writing on AI, infrastructure, and local model evaluation.
I built redundancy. It failed redundantly.
Four providers in the fallback chain. Nine cascade failures in one day. How two config files out of sync turned redundancy into a cardboard wall.
What running local AI on a Mac Mini actually taught me: 7 things the tutorials, YouTube and ChatGPT all skipped
Seven infrastructure gotchas from running a persistent AI daemon on macOS — from silent sleep mode to corrupted eval data.
My eval was silently giving every analysis task a failing score for weeks (and why)
112 consecutive failing runs on analyze tasks. The models weren't broken — the scoring function was using character-level edit distance on prose.
The model I designed as my floor outperformed every candidate
How IBM's smallest Granite model — picked as the control floor — ended up as one of the strongest performers in a 38-run evaluation.
The free TTS model that beats OpenAI
A round-trip TTS evaluation comparing sherpa-onnx VITS, macOS say, and OpenAI's TTS APIs. The free offline model scored highest.
My OpenClaw chronicles — 958 shadow test runs later: what the data actually shows about local AI quality
958 scored runs across 38 model/task pairs, seven task types, a two-judge ensemble, and zero promoted models. Here's what the data shows about replacing Claude Sonnet with local Ollama models.