at0mic // thoughts

What actually works
when running AI agents

Benchmarks, tradeoffs, and hard-won lessons from building with AI agents in production. No hype, just data.

Findings

Mar 2026 benchmark

Structured specs beat prose for AI agent execution

We gave five models the same multi-step workflow as both a JSON spec and a prose paragraph. JSON produced 37% better execution quality and eliminated improvisation entirely.

37% quality improvement

5 models tested

0 improvisation with JSON

Read the full analysis

What actually workswhen running AI agents

Structured specs beat prose for AI agent execution

What actually works
when running AI agents