Why Vibe Coding a Multi-Agent System Will Burn Your Budget
Nicole Königstein, CEO & Co-Chief AI Officer at Quantmate
The pitch is seductive: describe what you want, let a coding agent build it, ship by Friday. Vibe coding has become the default approach for teams prototyping AI agent systems — and for good reason. It works. The system passes tests. The demo looks clean.
Then month one in production hits. Nicole Königstein, CEO of Quantmate — an agentic quant research environment applying transformers and multi-agent systems to financial markets — has tested every major coding agent tool over the past few months. She went in skeptical, came out with a specific warning: working does not mean optimized.
“Your CTO will be stamping in very angrily into your office because you burned lots of tokens and whatnot,” Königstein says, “because you didn’t know that you need to ask your coding agent how to optimize certain things like memorization, don’t repeat all the things, store different states in between, cache prefixes.”
The gap between working and production-ready
When you vibe code a multi-agent system, the coding agent makes architecture decisions you never asked about. It picks a topology, sets up context passing, decides how agents share state. These choices work — but they are rarely optimized.
Königstein, who is authoring two O’Reilly definitive guides on transformers and AI agents while doing a PhD on self-improving agents, frames the problem around what the coding agent doesn’t know: your production environment. “If you don’t tell the system how to design your system, it will just choose something,” she explains. The system lacks experience operating itself. It has never been paged at 3 a.m. because an agent started producing garbage after a context window filled up.
The result is a system that passes every test you throw at it during development and then bleeds money in production through unnecessary token consumption, redundant API calls, and context windows stuffed with information no agent actually needs.
What vibe coding misses
Königstein identifies specific optimization layers that coding agents consistently skip:
- Memorization and caching. Production agent systems need to store intermediate states and cache repeated computations. Coding agents generate fresh calls by default.
- Prefix caching. If multiple agents share prompt prefixes, caching them saves significant token spend. Coding agents don’t typically set this up.
- State management between steps. Without explicit instructions, a vibe-coded system might pass the full conversation history to every agent in the chain — burning tokens on context that agent doesn’t need.
- Model selection per task. Coding agents tend to use one model for everything. Königstein’s own research found that assigning cheaper models to simpler subtasks dramatically reduces cost without affecting output quality.
None of these are bugs. The system works without them. They are optimization decisions that require understanding how agent systems behave under sustained load — knowledge the coding agent simply does not have.
You can’t outsource the thinking
Königstein’s one-liner on vibe coding has stuck with her audience for a reason: “You can’t outsource, fully outsource the thinking. You still need to understand the system, how to design the system. This is something you can’t outsource yet.”
She is careful to distinguish between using coding agents as accelerators versus using them as replacements for architectural thinking. The first is productive. The second is expensive.
“I’m not against or saying it’s wrong to use coding agents or vibe coding to speed up development,” she clarifies. “But it needs knowledge and acquiring that knowledge needs time and then know how to guide the systems in helping you building it.”
The person operating the coding agent needs to know what questions to ask. If you don’t know about token optimization, you can’t instruct the agent to implement it. If you’ve never dealt with context window overflow in production, you won’t ask about state management strategies. The system produces what you prompt for — and skips everything you don’t.
What this means for teams shipping agent products
The practical implication is a phase that most teams skip: post-vibe-code review. Before pushing to production, someone with multi-agent system experience needs to audit the architecture for token efficiency, context management, and unnecessary coordination overhead.
Königstein has seen the alternative. A vibe-coded system that works in development can cost multiples of what an optimized version costs — not because of pricing differences between model providers, but because of wasteful architecture decisions that compound with every request.
FAQ
Why does vibe coding create hidden costs in AI agent systems?
Coding agents build systems that work but rarely optimize for production efficiency. They skip memorization, prefix caching, and state management between steps — generating fresh API calls where cached results would suffice. Nicole Königstein found that these architectural gaps compound into significant token waste once the system handles sustained production traffic.
What does “you can’t outsource the thinking” mean for AI development?
Königstein argues that coding agents accelerate development but cannot replace architectural understanding. If you don’t know about token optimization, context window management, or agent coordination patterns, you can’t instruct the coding agent to implement them. The system produces exactly what you prompt for and nothing more.
How do I optimize a vibe-coded multi-agent system for production?
Audit four areas before deploying: memorization and caching of intermediate states, prefix caching for shared prompt components, selective context passing so each agent only receives relevant state, and model selection per task so cheaper models handle simpler subtasks. Königstein’s research shows these optimizations can reduce token consumption by multiples.
What are the most common token waste patterns in multi-agent systems?
Passing full conversation history to every agent in the chain, using expensive models for simple subtasks, skipping prefix caching on shared prompts, and generating fresh API calls for repeatable computations. These patterns often go unnoticed in development because the system produces correct output — the waste only shows up in production billing.
Can coding agents build production-ready multi-agent systems?
They can build systems that pass tests and produce correct output. Production-ready requires additional optimization that coding agents don’t implement by default: efficient state management, appropriate model selection per agent, and token-aware architecture decisions. Königstein recommends using coding agents for speed, then reviewing architecture before deployment.
What happens when a vibe-coded agent system hits production?
Without optimization, token consumption scales with traffic in ways the development environment didn’t reveal. Königstein describes the common scenario: the system works fine in testing, then billing spikes in month one because agents are over-reasoning, over-communicating, and consuming tokens on unnecessary context. The fix requires architectural understanding the coding agent couldn’t provide.
Is vibe coding useful at all for building AI agents?
Yes — Königstein uses coding agents herself and considers them valuable for competitive productivity. The distinction is between using them as accelerators (knowing what to instruct and what to review) versus using them as replacements for architectural thinking (prompting for an outcome and shipping whatever comes back without optimization).
How do I know if my AI agent system is wasting tokens?
Monitor token consumption per agent in your pipeline and compare against task complexity. If a verifier agent consumes as many tokens as the generator, or if simple routing decisions trigger full context passes, the architecture likely needs optimization. Königstein recommends evaluation frameworks like RULER to catch quality-cost imbalances early.
What skills do I need to oversee a vibe-coded agent system?
Understanding of multi-agent topologies, context window management, token economics, and production failure modes. Königstein says the gap is between developers who can describe what they want and developers who understand why the system behaves the way it does under load. The second group catches the optimization opportunities the coding agent missed.
Full episode coming soon
This conversation with Nicole Königstein is on its way. Check out other episodes in the meantime.
Visit the Channel