Why Google Search Misses 80% of the Web

When you search Google for a flight from San Francisco to Delhi under $500, you get results for travel sites. But you have to actually visit each site, fill out date filters, set your budget, and check availability. Google doesn’t do that work for you.

This is the blind spot at the heart of web search — and Devi Parikh, Co-CEO of Yutori, has built an entire product around exploiting it.

The core insight is deceptively simple: a massive portion of the internet’s most valuable information isn’t publicly crawlable. It lives behind lightweight forms, interactive maps, dropdown menus, and state-dependent interfaces. “A lot of times, even for apartments, if you’re on whatever apartment website, there are filters that you have to pick on — which neighborhood, what range of rents, are you interested in two bedroom, two bath?” Parikh explains. “There’s a whole bunch of check boxes and sliders that you have to interact with to find the relevant results. It’s not like there’s some long list that Google can just crawl and find for you. So there’s vast information that exists in databases but is not just listed on the web page.”

The Heavy Tail of Information You’ll Never Find

Search engines are optimized for public content. They crawl static HTML and index text. But the real-world decision-making data lives elsewhere: restaurant reservation systems, concert ticket availability, apartment listings with dynamic pricing, product stock levels that change hourly, real estate listings filtered by location and budget.

“If you said let me know whenever this tennis court becomes available for Monday 7 a.m. slots — some local tennis court in my neighborhood — Google search is not going to find that,” Parikh points out. The information is there. It’s not proprietary. It’s not secret. But accessing it requires human-like interaction: click the date picker, select Monday, pick 7 a.m., check availability. Google’s crawler stops at the front door.

This isn’t a flaw in Google’s design. It’s an architectural choice. Google crawls the static web. But the dynamic, interactive web — the one humans actually use to make decisions — remains largely invisible. The consequences compound across countless verticals. Ticket prices. Product availability. News buried in comment threads on Reddit or Twitter. Job postings with specific skill filters. Competitive intelligence on niche review platforms.

“The heavy tail of information on the web, which is a significant mass of it, is about information that exists in databases but is not just listed on the web page. And you have to click around and do some interactions to actually pull,” Parikh explains. The scale is vast enough that it matters for entire product categories.

Why Screenshots, Not Code

Yutori’s solution is to deploy autonomous agents that simulate human browsing. These agents take screenshots, analyze what they see, decide on the next action, and click. This raises an obvious counter-question: why not just parse the HTML and DOM instead? Why use the visual layer when the structured code is right there?

Parikh and her team tried that approach first. “Different websites are built in such different ways that there is so much noise if you take that raw HTML DOM information that it’s very hard to train models to be reliable enough,” she says. The noise is real. Hidden elements. Invisible placeholder divs. Deprecated code that’s never been cleaned up. Dynamically rendered content that doesn’t exist until JavaScript runs. “Your context also blows up, right? You’re putting in these massive numbers of tokens to get to that.”

Websites aren’t built for machines to read them as code. They’re built for humans to see them. The rendering layer strips away that noise. “What we found is that just using the screenshot is what gives you that generality.”

This decision has cascading implications. It means Yutori’s agents can work on any website — custom builds, legacy systems, complex single-page applications, anything a human can navigate. Generality wins. The cost is higher per-request (processing screenshots takes more tokens), but the gain is flexibility that spans the entire web, not just optimized domains.

The Coverage Problem Requires Architecture

Once you accept that you need to click through the web like a human, the next problem is scale. You can’t have one agent orchestrating LinkedIn, Reddit, flight APIs, apartment sites, news feeds, and competitor review platforms all at once. Context explosion kills the model’s ability to reason across so many tools.

Yutori’s solution is hierarchical. Sub-agents specialize. One agent finds information across Reddit and social media. Another handles flight data. Another monitors news sources. Each has access to only the tools relevant to its domain. “When you start getting to like a hundred different tools, these models start falling apart in terms of being able to orchestrate across all of this,” Parikh explains. Hierarchical agents solve the orchestration problem. But they introduce a new one: combining results without losing signal.

This is where the two-layer architecture becomes elegant. The sub-agents are optimized for recall — they find as much relevant information as possible, even if it’s noisy. The orchestrator at the top is optimized for precision. “All of these individual agents optimize for coverage and recall to very exhaustively find as much relevant information as they can, which obviously has the downside that precision may not be great. And then the orchestrator at the higher level is in charge of taking all this information and optimizing for precision.”

You find everything, then trim it down. The sub-agents work hard and loud. The orchestrator makes sense of the noise. Together they solve what no single agent could.

FAQ

What information can’t Google find on the web?

Anything that requires user interaction to unlock — product prices and availability, reservation systems, ticket availability, real estate listings with dynamic filters, restaurant seating, local service availability, and information buried in comment threads or discussions rather than published as articles. Google crawls static pages, but not the interactive layers where most real-world decisions happen.

Why is parsing HTML not enough for web automation?

Different websites build their HTML in radically different ways. Hidden elements, placeholder divs, and dynamically rendered content create massive noise that makes it hard for models to learn reliably. Screenshot-based agents see what humans see — the signal, not the noise. The trade-off is higher token cost per request, but generality across any website.

How do web agents handle hundreds of different data sources?

Yutori uses hierarchical agents: specialized sub-agents handle different domains (Reddit, flights, news, etc.), each with access only to relevant tools. Sub-agents optimize for finding everything (high recall). A top-level orchestrator optimizes for precision — determining which results matter most to the user. This avoids context explosion and keeps costs manageable.

What happens if an agent gets the information wrong?

Yutori includes citations for every piece of information and an “Inspect Work” button that shows you which sources the agent consulted and what actions it took. You can click through to verify. This transparency is critical because agents can make mistakes — but being able to see their work builds trust faster than blind trust in a model.

Is this the same approach large AI labs use?

Yes. Major AI labs also discovered that screenshots work better than DOM parsing for general web automation. The community converged on this through independent experimentation. It’s become the standard approach for building agents that need to work across diverse websites.

Could you just build a cleaner version of the web’s HTML?

Theoretically, but it would require coordination across the entire web and websites would have to maintain it — a network coordination problem that doesn’t scale. In practice, it’s faster and cheaper to build agents that handle messy HTML by viewing the rendered output.

What’s an example of something Scouts found that Google couldn’t?

A startup founder set up a scout to monitor for negative reviews of competitors. The scout found comments buried in subreddit threads — not viral posts, just conversations in the corner of the internet. These comments never surface in Google search results, but they’re gold for lead generation. Google optimizes for scale and relevance; scouts optimize for thorough coverage of a specific intention.

How many corners of the web can Scouts actually cover?

Scouts connects to APIs for flights, LinkedIn, Reddit, and dozens of other platforms. For any source that has an API, the scout gets data instantly. For sources without APIs, the scout uses visual navigation. The breadth is constrained by engineering effort (building integrations takes work), but there’s no architectural limit.

Does this require a custom model for web navigation?

Yutori trained its own in-house model for web navigation because the cost of using large foundation models for every click is prohibitive. When you’re monitoring something continuously — checking every hour, every day, for weeks — a 1-2% cost reduction per action compounds into meaningful savings. Custom models are expensive to train but cheaper to run at scale. For Scouts, the economics only work because of the in-house model.

Will Scout agents eventually replace human web research?

Not entirely. Scouts are monitoring systems, not complete agents. They find and summarize. They don’t make decisions or take actions without your input. The human still has to decide what to do with the information. But they’ll handle the exhausting part — the continuous monitoring and systematic coverage that humans can’t sustain.

The Heavy Tail of Information You’ll Never Find

Why Screenshots, Not Code

The Coverage Problem Requires Architecture

FAQ

More from Devi Parikh