How to Scale Web Agents Without Drowning in Context

When you’re building an agent that needs to find information from anywhere on the internet, the naive approach seems obvious: give the agent access to all available tools and let it figure out which ones to use.

Give it Google Search. Give it LinkedIn. Give it Reddit. Give it flight APIs. Give it news aggregators. Give it product databases. Dozens of sources. Maybe a hundred different tools and integrations. One superintelligent agent should be smart enough to use the right tool at the right time.

It doesn’t work.

“When you start getting to like a hundred different tools, these models start falling apart a little bit in terms of being able to orchestrate across all of this,” Devi Parikh explains. The problem isn’t intelligence. It’s math.

The Context Explosion Problem

Every tool description consumes tokens. If you have 100 tools, each with a description of what it does, when to use it, and how to call it, you’re spending thousands of tokens just describing the available options before the model even starts reasoning about your request.

“Even if there was somehow one powerful system that could deal with 100 different tools, the context windows are going to blow up because each of these tools is going to add so many tokens,” Parikh says. Add the user’s query. Add the previous steps the agent has taken. Add conversation history. Add relevant examples. Your 100,000-token context window is half-full before the model makes a single decision.

Scale that to thousands or millions of scouts running for different users with different monitoring tasks, and the token consumption becomes prohibitively expensive. Every user’s scout is burning through context just describing tools it might not even need.

There’s also a capability penalty: models get worse at selecting from massive option sets. Present a language model with 20 tools and it can reason about which to pick. Present it with 100 and it gets confused, makes worse decisions, hallucinates tools that don’t exist. The more options, the worse the performance.

“So having this hierarchical setup allows you to deal with that context explosion in a reasonable way,” Parikh explains. Instead of one agent seeing all tools, you build a hierarchy.

The Hierarchical Solution

Yutori’s architecture divides the problem into layers. The orchestrator sits at the top level and doesn’t need to know about individual tools. It knows about categories of agents. “The architecture that we’ve built has a hierarchy to it, where there are sub-agents that are sort of responsible for different kinds of tasks. And then those sub-agents based on whatever sort of their responsibility is have access to certain tools that are relevant for that kind of work.”

One sub-agent is responsible for finding information across Reddit and social media. It has access to the Reddit API, Twitter API, and related tools. It doesn’t need to know about flight APIs or LinkedIn.

Another sub-agent specializes in e-commerce and product monitoring. It has access to product databases, pricing APIs, and availability trackers.

A third handles news and research. It has access to news APIs, RSS feeds, academic databases, and discussion forums.

The orchestrator at the top level never needs to see all hundred tools. It only needs to understand the high-level categories: “this user wants to monitor product prices” → delegate to the e-commerce sub-agent. “This user wants to track news about a startup” → delegate to the news sub-agent.

Each sub-agent operates in a constrained context. It has far fewer tools to choose from. It makes better decisions faster. The context window is manageable.

The Two-Layer Optimization

But hierarchical delegation creates a new problem: how does the orchestrator know which results to trust? If three sub-agents each find information relevant to the user’s query, how does the system know which is most important?

Yutori solves this by separating the objective functions. Sub-agents optimize for recall — finding as much relevant information as possible. “All of these individual agents are first optimizing for coverage and recall to sort of very exhaustively find as much relevant information as they can, which obviously has the downside that precision may not be great,” Parikh notes.

The orchestrator then optimizes for precision. “And then the orchestrator at the higher level is in charge of making sure that taking all of this sort of the high recall information that’s coming in and then optimizing for precision. That’s the one assessing that, which of all of these things that have been found is relevant for the user.”

This is elegant. The sub-agents work hard and return lots of potential matches. The orchestrator filters and ranks. You get comprehensive coverage without sacrificing signal quality. The user sees only the most relevant findings, but the system didn’t miss anything by being too selective early on.

The workflow is simple but powerful: find everything, then trim it down.

Cost Implications

There’s one more reason for hierarchical architecture that’s rarely discussed but economically critical: cost.

“And finally, even if somehow over time all of this stops being an issue, costs are going to be very expensive if you cram everything into just one overall agent,” Parikh explains. Every tool description in the context costs tokens. Every decision an agent makes costs tokens. Every inference is money.

If you have one agent making every decision for a thousand users, each of whom has multiple scouts running, the token consumption per decision is astronomical. Now multiply by the fact that scouts run continuously — monitoring the web on a schedule, sometimes hourly, sometimes daily.

With a hierarchical system, each sub-agent only loads the tokens it needs. The orchestrator makes fewer decisions (high-level categorization rather than granular tool selection). The system costs less per action. At the scale Yutori operates (thousands of scouts running simultaneously), the difference between hierarchical and monolithic is the difference between a viable product and economically impossible.

When Generality Matters Most

The hierarchical approach also preserves generality better than building one custom agent per use case. You could build a perfect agent for flight monitoring. Build another for product pricing. Build another for news tracking. Each would be optimized, cheap, and effective.

But then you’ve built three separate systems. You’re maintaining three code paths. You’ve fragmented the user experience. New use cases require new agents.

Yutori chose the harder path: one hierarchy that can be extended. Add a new category of data sources? Add a new sub-agent to the hierarchy. Add a new tool? It plugs into the appropriate sub-agent without touching the orchestrator. The system scales by addition, not by multiplication.

This requires more sophisticated orchestration. It’s harder to build. But it’s the only path to a truly general web automation platform.

FAQ

Doesn’t this just move the problem one level up? How does the orchestrator know what to delegate?

The orchestrator uses simpler reasoning. Instead of “which of these 100 tools should I use,” it’s “does this request involve monitoring for products, news, or social media?” That’s a much easier classification problem. You can solve it with simple heuristics or a lightweight classifier. You don’t need the orchestrator to be intelligent — you just need it to be accurate at categorization.

What if a request needs information from multiple sub-agent categories?

The orchestrator can delegate to multiple sub-agents and combine their results. “I need to find negative reviews about my competitor” might need both the social media agent (for Reddit comments) and the news agent (for published articles). The orchestrator collects results from both and ranks them together.

How many sub-agents does Yutori have?

The exact number varies based on what they’re building, but the principle scales from 3-4 initial agents up to dozens. The key is that the number of sub-agents grows much slower than the number of tools. Each sub-agent can handle dozens of tools.

Could you use a large enough context window to avoid this problem?

Technically, yes. If you had a model with a 1-million-token context window and paid the cost, you could cram everything in. But context windows still have performance costs — models tend to lose information in the middle and end of very long contexts. And the cost of processing that many tokens quickly becomes prohibitive at scale. Hierarchical design is more elegant and economical.

What if the user’s request spans categories the orchestrator doesn’t anticipate?

Good question. This is where you need either a more sophisticated orchestrator (that can make harder calls) or a feedback mechanism (where the user can specify which agents to involve). Yutori handles this by allowing users to give explicit hints about where their information lives — “check these specific websites” or “this is about flight prices” — which makes categorization easier.

Does the orchestrator ever make mistakes in delegation?

Yes. Occasionally a request gets routed to the wrong sub-agent. That’s why the hierarchical design includes a fallback: if a sub-agent returns low confidence results, the orchestrator can try another category. It also learns from feedback. If a user says “that wasn’t what I wanted,” the system can improve its routing for future requests.

Doesn’t this introduce latency? Now the orchestrator has to wait for sub-agents to finish?

It does. The orchestrator has to route, wait for results, and then aggregate. But you can parallelize — route to multiple sub-agents in parallel and wait for them all to finish. For monitoring products, where checks happen on a schedule (hourly, daily, weekly), the extra latency is negligible compared to the wait time.

Could you build a flat architecture with better tool descriptions instead?

Tool descriptions help, but they don’t solve the fundamental problem that models get worse at selection with more options. You’d still hit a wall around 20-30 tools. Hierarchical design is a structural solution, not a prompt engineering solution.

What’s the upper limit to how many sub-agents you can have?

Theoretically, you could keep adding agents. Practically, you’d want to keep the number small (under 20) because the orchestrator still needs to understand each category. If you have 100 sub-agents, you’re back to the original problem: how does the orchestrator decide which one to invoke? The hierarchy works best when there are 3-10 main categories that are truly distinct.

Does this architecture only work for web agents?

No. Any system that needs to orchestrate many specialized tools can benefit from hierarchy. Multi-modal systems (combining vision, language, code generation). Research assistants with many knowledge bases. Autonomous planning systems. It’s a general architectural principle that applies anywhere you’d otherwise overwhelm a single decision-making system.

How to Scale Web Agents Without Drowning in Context

The Context Explosion Problem

The Hierarchical Solution

The Two-Layer Optimization

Cost Implications

When Generality Matters Most

FAQ

More from Devi Parikh

Related Insights

Why More AI Agents Won't Fix Your System

AI Agents Are Distributed State Machines — What That Means for How You Build Them

Can a $100M Company Run Finance With One Person?