The $706 day that built Company Agents
Why we ripped the engine out of Paperclip and rebuilt everything underneath it.
We were trying to run an AI website agency. The mission was simple: rebuild small-business websites to $20,000 premium quality using AI agents instead of humans.
First client on the test bench was Ampha Group, an accounting firm in Sandton, South Africa. We had enrichment data — brand colors, logos, services, testimonials. We had a pipeline:
stages:
- opus_plans_the_site
- step_flash_designs
- gemini_flash_generates_images
- gpt_54_mini_builds
- sonnet_qasTarget cost per site: $25.
What actually happened
We burned $706 in 24 hours and the site was not shipped.
We were using Paperclip, a single-task agent runner with a heartbeat system. Every 3 to 5 minutes the agents would reload their full context — which was now 2.4 million tokens deep — and make another LLM call. Each agent was a massive context window being re-sent on a timer.
The Designer spent hours applying the same CSS fix to Header.tsx over and over, forgetting what it already tried. Child processes were never cleaned up so zombies ate our CPU. Agent crashes left task locks stranded in the database and we had to write raw SQL to unblock work.
The QA agent kept approving broken sites. It invented fake offices in London and New York for a company that only has one office, in Sandton.
What we tried
- Custom pipeline watchdog checking every 5 minutes
- Custom zombie reaper
- Switched QA to Opus with an adversarial SOUL.md
- Added attempt tracking to checkpoints
- Added loop detection to heartbeat instructions
- Increased heartbeat to 5 minutes
- Added queue purging to the watchdog
- Set
CLAUDE_AUTOCOMPACT_PCT_OVERRIDE=50 - Tried Pi, GLM via z.ai, Kimi K2.5, Gemini 3.1 Pro Preview
- Disabled the Planner agent entirely
- Manual SQL queries to clear locks
None of it was enough. The fundamental architecture — heartbeat-driven execution with flat task queues and checkout locks — was the problem.
What we built instead
We ran 9 parallel research agents across 200+ sources. We read every orchestration post-mortem we could find. We landed on:
- Leases instead of locks. Auto-expiring TTLs with fencing tokens, so
crashed agents don't strand work forever.
- Structured checkpoints instead of conversation dumps. Written by the
orchestrator, not the agent, in a universal format across every adapter.
- Loop detection with output-hash dedup. The "apply the same CSS fix
400 times" scenario is now impossible.
- Process group isolation via
detached: true+kill(-pid). No more
zombies.
- Stacked budgets at every level — company, team, agent, workflow,
task, loop. Every inner cap protects every outer one. No runaway possible.
And then, just as important, we stopped pretending humans didn't belong in the loop. Company Agents is built for hybrid teams. AI ships the work; humans own the decisions that matter.
Today
We shipped v0.1.0 two days ago. You can download it for Mac, Windows, or Linux. Free tier is available immediately. Pro is $20/mo. Teams start at $15/seat.
If any of this resonates, come say hi in Discord. We're easy to find.