Why AGI won't fix your AI agent's biggest problem.

Gil Feig, co-founder and CTO of Merge, asked me a question that stuck around after our conversation ended. It was structured as a thought experiment, but it’s the kind that breaks how you think about agent scaling.

Imagine you had the smartest possible person on the planet. A true genius. You go up to them and ask: “Which of my customers were upset last year?” The question Gil poses isn’t about their intelligence. It’s about whether they have the data to answer it.

Everyone building with agents right now is betting on intelligence. We’re watching Claude, GPT-4, and the next wave of models get exponentially smarter. The assumption is that smarter models solve agent problems. AGI solves everything. But Gil’s question exposes a category of problem that raw intelligence doesn’t touch.

The data isn’t the by-product. It’s the whole thing.

Here’s what happens in the naive approach: you give an agent API keys to your ticketing system, your HR tools, your CRM, your docs. The agent fetches everything when a user asks a question. You stuff all that data into the context window and let the model answer.

It fails for a reason that has nothing to do with intelligence.

“AGI, it also like there’s AGI, there’s also like context window sizes,” Gil told me. “So like we might have a really smart agent that still can’t take a lot of context. But if we have AGI and unlimited context, you get better and better. You still though run into the problem of how do I extract all of this data from a third party system to actually then send into the context of the agent.”

Even with unlimited context and perfect reasoning, you’ve hit a wall that isn’t cognitive. It’s infrastructure. You can’t efficiently pull all of your company’s ticketing data from Jira in real time. The API doesn’t support it. The structure varies wildly per customer. There are permissions you can’t see. Custom fields. Scopes. Undocumented behavior.

So you’re not waiting for smarter models. You’re waiting for data infrastructure that can extract, normalize, and structure that data before the agent ever sees it.

“It’s the data for sure,” Gil said when I pushed back. “So it really, I’m really understanding it as the bottleneck is really not intelligence, it’s actually the data.”

What syncing actually solves

This is where the infrastructure layer makes sense. Companies like Glean build the best enterprise search because they sync a copy of your data first. They don’t ask the LLM to fetch on the fly. They pre-structure everything, normalize it, optimize it for semantic search. Then they pass the agent access to that pre-processed copy.

That’s not AGI. That’s engineering.

When you need broad analysis — “Which customers were upset in the last year?” — you need the full dataset available and pre-optimized. When you need a single lookup — “Get me ticket 347” — you can make a live API call. The infrastructure choice depends on the use case, but the bottleneck is always the same: getting the right data to the agent, not making the agent smarter.

Gil described this as the difference between synced and live integrations. Synced data gets pre-indexed and queryable. Live APIs stay live. Both matter. Neither is solved by AGI.

The gap between demo and deployment

This insight matters because it shifts where infrastructure builders should focus. Instead of waiting for better models, the leverage is in building systems that can reliably extract data from hundreds of different APIs, normalize it across different customer configurations, and manage permissions correctly.

That’s the work that doesn’t scale with intelligence. A genius still can’t answer a question they don’t have data for. An AGI still can’t pull data from an API that wasn’t designed for mass extraction. The bottleneck isn’t in the thinking. It’s in the plumbing.

For founders building agents, this means the limiting factor for your next release probably isn’t the model. It’s whether you’ve solved the data layer. That’s uncomfortable because it’s not as exciting as waiting for the next model checkpoint. But it’s the difference between a demo that works on one customer and a product that works on a thousand.

FAQ

Won’t AGI eventually figure out how to extract any data? Even if the model becomes infinitely intelligent, it still has to respect the constraints of the API it’s talking to. Those constraints are technical, not intellectual. An API that only returns 100 records per call will never return all 10,000 records to an agent, no matter how smart the agent is.

Is this just saying RAG is important? RAG is one solution to the data problem. But the broader point is that any agent system needs a way to surface the right data at the right time — whether that’s through RAG, synced data, live API calls, or a mix. The bottleneck isn’t LLM intelligence. It’s data architecture.

So we should stop waiting for better models? No. But you shouldn’t assume better models solve infrastructure problems. The two move in parallel. Better models help with reasoning and following complex workflows. But they don’t help if you can’t get the data to them in the first place.

Does this apply to all agent use cases? It applies whenever an agent needs to access real-world data. If you’re building a chatbot that answers from a static knowledge base, you’re less constrained by data infrastructure. But the moment your agent needs to sync with external systems, work with customer-specific data, or handle permissions, this becomes critical.

Isn’t context window size the main constraint? Context windows matter, but the deeper constraint is getting all the data out of the third-party system in the first place. You can’t stuff data into context that you can’t extract efficiently from the source system. Data extraction is the bottleneck that comes before context limitations.

What should I focus on if I’m building an agent product? Before optimizing for model quality, make sure you’ve solved the data layer. Can you reliably sync data from your customers’ systems? Can you normalize that data across different configurations? Can you manage permissions correctly? That’s where the real complexity lives.

Is Gil saying we don’t need smarter models? No. He’s saying smarter models alone don’t solve the class of problems that infrastructure tackles. A smarter model is useful. A smarter model with good data infrastructure is powerful. A smarter model with no data infrastructure is just more expensive compute.

What if the agent doesn’t need to access external data? Then you’ve avoided this bottleneck entirely. But most enterprise agents need context from multiple systems — that’s where this problem appears. If your agent only needs to work with data you’ve pre-loaded, you’re in a simpler case.

Why AGI won't fix your AI agent's biggest problem.

The data isn’t the by-product. It’s the whole thing.

What syncing actually solves

The gap between demo and deployment

FAQ

Related Insights

Why 80% of RAG Pipelines Fail in Production

Building AI Agents That Enterprises Actually Trust

How to climb the enterprise logo ladder as an infrastructure startup.