Why your AI agents need a budget
Without a cap, an agent will keep trying. That is how we burned $706 in 24 hours and shipped nothing. Here is the model we built to make it impossible to do it again.
An AI agent that runs without a spend cap is a program that will, at some point, spend all of your money. Not because it is malicious. Because the loop is not designed to stop.
This is the blog post I wish someone had put in front of me 12 weeks ago.
The $706 day
We were building an AI website agency. Five agents in a pipeline: Opus plans the site, Step Flash designs the visuals, Gemini Flash sources images, GPT 5.4 Mini builds the code, Sonnet runs QA. Target cost per site: around 25 dollars.
First client on the test bench: an accounting firm in Sandton. We fired up the pipeline on a Sunday afternoon, walked away to make coffee, and came back four hours later to find the Designer agent applying the same CSS fix to the header component over and over. Hundreds of times. Same output, same diff, same error, same retry. By the end of day one, the bill from our providers was 706 dollars and the website was still not shipped.
The agent was doing exactly what we told it to: keep going until the goal is met. The goal was "make the header look right". It never looked right. The loop never stopped.
Why caps are the only honest answer
When we went back to the drawing board, we spent some time trying to make the agent smarter. Better loop detection. A memory of past attempts. A timeout. A retry budget.
All of that helped a little. None of it solved the actual problem, which is that an agent running toward a goal will always prefer "try one more thing" to "give up". You can teach it to give up more often, but you cannot teach it to give up at the exact right moment. If you could, it would not be an agent.
The only reliable fix is an external cap. A number that counts down regardless of what the agent thinks, and when it hits zero, the agent stops whether or not it believes it is done.
Once you accept that, the only question is where the caps live.
Six scopes, stacked
We landed on six scopes of cap, stacked so that each inner one is constrained by the one around it.
Company. A monthly budget for everything the company spends on LLM calls. This is the outermost ring, the one your finance team cares about. When it trips, every agent in the company stops.
Team. A monthly budget for each team inside the company. The engineering team gets 500 dollars, the marketing team gets 300, the ops team gets 200. Teams cannot exceed their cap regardless of what the company budget has left.
Agent. A per-agent monthly spend cap. Your CEO agent gets 500 dollars of headroom, your copywriter gets 80, your QA reviewer gets 40. Individual agents cannot drift into spending patterns that do not match their role.
Workflow. A per-run cap on each workflow execution. A website build gets 25 dollars. A nightly report gets one dollar. A lead triage routine gets five cents. The workflow cap is usually tight because each run should be cheap, and tightness is the circuit breaker that saves you from a loop.
Task. A per-step cap inside a workflow. One website-build run has five stages, and each stage has its own cap. Crawl gets 50 cents. Design gets 3 dollars. Build gets 15 dollars. QA gets 75 cents. The stages cannot bleed into each other.
Loop. The innermost ring. A per-retry cap on the literal decision loop of the agent. After ten iterations on the same step, it escalates to the next level up instead of retrying an eleventh time. This is the cap that would have stopped the 706 dollar day cold.
Why stacking matters
The reason you need all six is that the failure modes hide in the gaps between them.
A company cap alone fails when one agent eats the whole budget in a day and starves everything else.
A team cap alone fails when one agent inside the team runs up the bill while the other agents in the team are idle.
An agent cap alone fails when the agent is technically within its monthly cap but is running a single runaway workflow that produces nothing.
A workflow cap alone fails when the workflow is within budget but one stage of it is looping.
A task cap alone fails when the task is within budget but the loop inside the task is stuck.
Every cap plugs a gap that the outer one leaves open. You need the whole stack or you have an open seam.
What happens when a cap trips
Every cap fires the same behavior: the agent pauses, the orchestrator logs the overrun to the audit trail, and a notification goes to the manager of whichever scope was breached. The manager can raise the cap, kill the run, or escalate further up.
The important part is that the decision to keep spending is always made by a human. Not by the agent, and not by the previous version of the cap that was set a week ago. By a human looking at the situation in real time.
What it looks like in practice
In Company Agents, every time an agent is about to make a tool call or a model call, the orchestrator checks the six caps in order, inside out. Loop, task, workflow, agent, team, company. The tightest one that has less than the cost of the next step triggers an escalation.
The escalation sits in an approval queue with enough context for the manager to make a call in seconds: what the agent was trying to do, how much it has spent already, how much more it needs, what the output looks like so far. The manager approves, denies, or adjusts the cap. The agent resumes if approved.
Most of the time nothing gets escalated, because most runs fit comfortably inside the cap. The caps exist for the days that do not. Those are the days that used to cost 706 dollars. They now cost the value of whatever the tightest cap was set to, which is usually under a dollar.
The core lesson
If you take one thing from this post, take this: an agent without a cap is not an agent, it is a cost incident waiting to happen. Caps are not a nice-to-have, they are how you make the whole thing safe to actually run.
Stack six of them, put humans at the top of the escalation, and spend the saved weekends doing literally anything else.