Why Monitoring Agents Demand Custom Models: The For-Loop Cost Problem
Devi Parikh, Co-CEO at Yutori
Devi Parikh offers a deceptively simple way to understand why Yutori had to build its own model.
“If you think about an LLM, it’s basically a for loop over token generation. You’re generating tokens one at a time. If you think about what an agent is, it’s basically a for loop around an LLM. Every time it’s looking at a screenshot, it decides what action to take next. That then changes the website. So it looks at that screenshot, and then it decides what action to take next. And each time you’re deciding what action to take next is a call to an LLM. So this agent is now a for loop over an LLM. And a scout is a for loop over a whole team of agents, because it’s constantly monitoring.”
Three nested for-loops. Token generation inside an LLM loop inside an agent loop inside a continuous monitoring loop.
This abstraction is valuable because it explains why generic foundation models become economically impossible at scale.
The Nested Loop Cost Structure
A single LLM call to analyze a screenshot and decide the next action might cost $0.10 in tokens (depending on the model and screenshot size). That’s fine for an isolated task.
But an agent doesn’t make one decision. It makes multiple decisions to navigate a page — take a screenshot, decide where to click, click, take another screenshot, decide the next action, repeat. A simple navigation task might require 5-10 LLM calls. Cost: $0.50-$1.00.
Now scale to a scout. A scout runs continuously. It might check once per hour, once per day, or on a schedule the user specifies. But even at daily checks, a single scout making 10 agent decisions per check is 300+ LLM calls per month. For 1,000 users with an average of 3 scouts each, you’re at 9 million LLM calls per month.
At $0.10 per call, that’s $900,000 per month. At pricing typical of foundation models in early 2024 (higher than today), it was prohibitively expensive. Even at current reduced rates, it’s economically unsustainable for a free or low-cost product.
Parikh’s framing makes the math transparent. You’re in a nested loop. LLM cost compounds with each level. “So, this is one perspective of getting a handle on what costs end up looking like, which is where the fact that we have trained — we have post-trained our own in-house web agent, like the model that takes actions on the web becomes relevant.”
Why Custom Models Are Forced Economics
A custom model trained or fine-tuned specifically for web navigation is expensive upfront. You need training data. You need compute for training. You need inference infrastructure. You need ongoing maintenance.
But once built, it’s dramatically cheaper to run. A model optimized for this specific task might cost 80-90% less per inference than a foundation model while actually performing better on web navigation (because it’s trained on examples of web navigation, not generic text).
The calculation changes entirely. At 3 million web navigation calls per month, the cost difference between a foundation model and an in-house optimized model is the difference between millions of dollars in monthly inference costs and a few hundred thousand in amortized training costs plus manageable inference costs.
“Because it’s in-house, that has sort of a significant dent on how much it costs us to serve the product, which is what makes it even feasible for us to open up Scouts to everyone to use,” Parikh explains.
This is why you see frontier AI labs building custom models for their own tools. Not because they can’t afford foundation models — they can. But because at the scale they operate, the economics of custom models are irresistible. Same for companies building agents that run at scale.
The Business Forcing Function
This cost structure isn’t an abstract concern. It directly shaped Yutori’s product decisions. They couldn’t launch Scouts if the inference costs were high. Foundation models would have required users to pay significant fees to cover the cost of running agent loops continuously.
An interesting implication: the companies that can afford to build custom models have a significant cost advantage over competitors using only foundation models. You can price your product lower, offer more generous quotas, or build a more sustainable margin structure.
The tradeoff is that you need either significant capital (to fund the model development) or technical expertise (to train and maintain models in-house). Both create barriers to entry that protect companies like Yutori once they’ve built them.
The Monitoring Tax
There’s a specific cost penalty for monitoring systems that doesn’t apply to one-shot agent tasks. A one-shot agent answers a question once and stops. A scout answers the same question repeatedly — daily, hourly, or on a continuous basis.
This is where the time-based for-loop layer matters. Monitoring doesn’t just increase costs linearly. It multiplies them. A task that costs $1 to complete once costs $30 to monitor continuously for a month. The system needs to be radically more efficient at scale.
“We kind of did all of that homework before we went GA, right? Like the product was behind a wait list for a while and we did a lot of this sort of cost optimization and making sure we can handle everything at scale,” Parikh notes. That homework included the decision to build a custom model. Without it, the math doesn’t work.
Implications for Agents in General
The cost structure matters for any agent-based system planning to operate at scale:
One-shot agents (answer this question, take this action, stop) can probably work with foundation models if the cost per action is low enough and the volume is moderate.
Continuous agents (keep checking, keep monitoring, keep asking) almost certainly need optimization. Either custom models, aggressive caching, or architectural choices that reduce the number of LLM calls.
The companies that win in agent-based products will be the ones that optimize this loop aggressively. The difference between $10/month and $1/month in infrastructure cost per user is not marginal — it’s the difference between a venture-scale business and a sustainable one.
FAQ
What if Yutori just raised prices to cover foundation model costs?
They could, but it would kill product adoption. Scouts’ value prop is “monitor the web comprehensively.” If the cost is proportional to monitoring frequency and breadth, most users would adjust their scout frequency downward to avoid charges. The product becomes narrower and less valuable. Better to optimize costs and keep prices low.
How much cheaper is an in-house model than GPT-4?
For web navigation specifically, probably 80-90% cheaper per inference. Foundation models are general-purpose and charge accordingly. A specialized model has no “paying for reasoning on arbitrary text” overhead. The exact number depends on model size and infrastructure, but the gap is substantial.
Could Yutori use smaller open-source models instead of building their own?
Possibly. But web navigation requires understanding visual information from arbitrary websites. Most open-source models are smaller and less capable than what’s needed. Yutori might use open-source models as a starting point for fine-tuning, but building on them would still require significant investment.
Does the cost structure change if users pay per scout?
Yes, but it changes the user behavior. If scouts cost money per run, users create fewer scouts or run them less frequently. The product becomes less valuable because continuous monitoring becomes a premium feature. Free or cheap monitoring at scale is what makes Scouts valuable.
What about API costs for third-party integrations?
Good catch. Yutori pays for API access to LinkedIn, Reddit, flight data, and dozens of other sources. Those costs add up. But they’re orthogonal to the LLM cost problem. You have two cost vectors: API access and LLM inference. Optimizing LLM cost is one part of the puzzle.
If inference costs keep dropping, does the custom model advantage go away?
It shrinks but doesn’t disappear. If foundation model costs drop 80% tomorrow, a custom model would still have a 2-4x advantage (because it’s optimized for the specific task). The absolute cost becomes less of a blocker, but the comparative advantage favors specialization. Plus, Yutori’s competitive moat isn’t just cost — it’s quality of navigation decisions on arbitrary sites.
Why not just use caching to reduce LLM calls?
Caching helps but doesn’t eliminate the cost. You can cache page state between runs (“the website hasn’t changed since yesterday, here’s the cached screenshot and result”). But you still need to check that the page hasn’t changed (which requires a fresh screenshot and comparison). And different users with different scouts can’t share cache. Caching is a helpful optimization, not a fundamental solution.
Do all large-scale agent products need custom models?
Probably not all, but any continuous agent product operating at scale will face the cost problem severely. Products that run agents once (answer a question, do a task, stop) can operate with foundation models if the volume is moderate. Products that monitor continuously hit the triple-nested-loop problem hard.
What happens to Yutori’s economics if someone releases a much cheaper foundation model?
Two things. First, the absolute cost of foundation models becomes closer to custom models, which erodes the cost advantage. Second, if the foundation model is good enough at web navigation, it might negate the need for custom training. This is why Yutori tracks foundation model improvements closely. The competitive dynamic is constantly shifting. But custom models trained on proprietary data (Yutori’s successful and failed navigation examples) would still have advantages over general-purpose models.
Could this cost structure be managed with a different architecture entirely?
Maybe. Instead of agent loops, you could use scheduled crawling (crawl sites on a fixed schedule and store results). Instead of continuous monitoring, you could batch-process lookups. You’d lose the responsiveness and adaptability of agents, but you’d cut costs dramatically. Yutori chose agent-based monitoring because it’s more valuable. The cost is a constraint they engineered around, not a flaw in their approach.
Watch the full conversation
Hear Devi Parikh share the full story on Heroes Behind AI.
Watch on YouTubeMore from Devi Parikh
Founder Archetype
Read Devi Parikh's archetype profile
The Creator · Classical: Athena · Tests & Allies