Founder Insight

Why LLMs Don't Understand Time (And Why That Breaks Everything)

Daniel Davis, Co-creator at TrustGraph

Listen on TL;Listen Prefer to listen? Hear this article read aloud.

Language models have a fundamental blindness: they don’t understand that the world changes.

Daniel Davis, co-creator of TrustGraph, frames it in a metaphor that cuts to the core of the problem: “Fred has four legs. Well, does Fred have five legs now? Your LLM has no idea. It will tell you either 4 or 5, with equal confidence, because it’s pattern matching against its training data. But it has no built-in understanding that time exists.”

This isn’t a minor limitation. It’s a foundational architectural flaw that causes most enterprise AI hallucinations. And because it’s built into how transformers work, it can’t be fixed by better training data or bigger models. It requires a separate infrastructure layer — the kind most companies building AI systems don’t have.

The pattern-matching problem

Language models work by predicting the next token based on statistical patterns in training data. They’re extraordinarily good at this. But the training data is static — a snapshot of the world at a point in time, with a knowledge cutoff date that gets older every day.

“An LLM has no concept that information becomes stale,” Daniel explains. “It doesn’t understand that an article from 2020 is less reliable than one from 2025. It doesn’t understand that executives change, that strategies pivot, that market conditions shift.”

This is different from forgetting. A human who learns a fact, then encounters new evidence contradicting it, can update their mental model. An LLM, by default, has no mechanism to do this. It’s processing static training data as patterns. When you prompt it with a query, it’s predicting a likely response based on the distribution of text in its training set.

If the training set contains both the old fact (“Barcelona is in Spain, established during the Roman period”) and newer context (recent economic data, current demographics), the model will pick whichever one seems most likely based on the pattern — without any awareness that one is historical context and one is current information.

“An NHL coach story illustrates this perfectly,” Daniel describes. “Someone asked an LLM which team a coach worked for, expecting a recent answer. The model confidently returned a team from 2015. The coach had moved on a decade ago, but the model had no temporal awareness to know that.”

The confidence problem

What makes this worse than pure ignorance is that LLMs generate answers with unwarranted confidence. They don’t say “I’m not sure if this is current.” They say “The answer is X” — because X is the highest probability token given the input.

Daniel observed this when Foundation Capital published their context graph article: “It was one of the best SEO-engineered articles of all time. It got five million views. But it was basically a self-serving piece that funded Foundation Capital’s own portfolio companies. An LLM trained on the internet now confidently cites it as authoritative research on what context graphs actually are.”

The problem cascades. An AI system trained on Foundation Capital’s article absorbs their definition. It then generates responses based on that definition, which get fed into other training sets, which train new models, which cite the original source with increasingly high confidence — even though the original source was marketing, not technical truth.

This is why Daniel calls temporal reasoning “the holy grail” of AI infrastructure. Until an AI system has access to timestamps, source credibility signals, and the ability to understand when information changes, it will keep confidently hallucinating stale, corrupted, or misleading data.

Time as context

The solution isn’t better language models. It’s adding a context layer that understands time.

“A context graph with timestamps lets an agent understand when information becomes stale and when it needs to flag uncertainty,” Daniel explains. “Fred has four legs. Fred has five legs now. And here’s when the change happened and who observed it.”

This requires tracking not just facts, but the temporal metadata around them: when was the fact observed, when did it change, how consistent has it been, what evidence supports it. An AI agent with access to this layer can make smarter decisions.

“Most LLMs process static training data as patterns. They have no built-in understanding that information changes, that sources become outdated, or that time matters. They confidently cite information from their training cutoff as if it’s current,” Daniel says. The solution is separating the context layer (which changes, updates, tracks time) from the model itself (which learns statistical patterns).

The architecture works like this: instead of asking the LLM to answer directly, you query the context layer first. “What do we know about Fred? When did it last change? How credible is the source?” Then you give the LLM the context with temporal metadata included. The LLM can then generate an answer that acknowledges uncertainty or time-sensitivity.

The scale of the problem

Temporal blindness affects nearly every enterprise AI deployment. “Every company deploying an AI agent faces this,” Daniel observes. “They’re using current LLMs which have no concept of time. They’re deploying in 2026 with a model trained on 2024 data. The agent has no way to signal ‘this information might be outdated.’”

This is especially critical for domains where information freshness matters: financial services, healthcare, regulatory compliance, competitive intelligence. In these fields, an answer that was correct three months ago might be confidently wrong today. An LLM has no way to distinguish between them.

“A context graph with temporal reasoning is foundational infrastructure for reliable AI,” Daniel emphasizes. “Without it, you’re asking an agent to make decisions based on potentially stale, mixed-quality, or contradicted information — and the agent has no built-in awareness of any of these problems.”

What real temporal infrastructure looks like

TrustGraph 2.0, Daniel’s next focus after the core context graph infrastructure, is built specifically around this problem. The system tracks not just information, but how information changes over time and how consistent that change is.

“Reification over time is the key insight,” Daniel explains. “You don’t just store ‘Fred has four legs.’ You store ‘Daniel observed that Fred had four legs on this date’ and ‘Chris observed that Fred had five legs on this date.’ The system then tracks the change and measures consistency. Is Fred actually growing legs? Or is there a discrepancy in observation?”

This temporal dimension does something remarkable: it lets an AI system evaluate credibility through time. If a source has consistently reported accurate information over time, and their current claim aligns with that pattern, the system can assign higher confidence. If a source has been inconsistent or wrong before, the system can downweight it.

“Human reasoning actually works like this,” Daniel notes. “You have a mental model of how reliable different sources are, and it’s based on whether they’ve been right or wrong over time. AI systems need the same capability.”

The philosophical dimension

The temporal blindness of LLMs points to something deeper: language models are fundamentally snapshots, not living systems.

They’re trained on a corpus of text at a point in time. They generate text based on patterns in that corpus. They have no feedback loop with reality. They don’t learn from being wrong. They don’t update their knowledge. They’re frozen.

“A real intelligence would have temporal reasoning built in,” Daniel suggests. “It would understand that the world changes, that information becomes outdated, that sources vary in credibility, that time matters. An LLM has none of this without external infrastructure.”

This is why context graphs with strong temporal capabilities are potentially transformative. They don’t make language models smarter — they give them something they fundamentally lack: access to a living, evolving model of the world that knows when facts change and whether information is current.

FAQ

Why don’t larger language models fix the temporal reasoning problem?

Because temporal reasoning isn’t a function of model scale or training data quality. It’s a structural limitation of how transformers work. A transformer generates text based on statistical patterns in its training data. It has no built-in understanding that time exists or that the world changes. Scaling up the model doesn’t add temporal awareness — it just makes the pattern matching more sophisticated.

How can I tell if my AI system is hallucinating because of temporal blindness?

Watch for confident claims about recent events, current leadership, or up-to-date information that feel slightly off. If you ask about someone’s current role and the AI returns an outdated position, that’s temporal blindness. If it confidently cites research that was good three years ago but has been superseded, that’s the same problem.

What’s the difference between a hallucination from missing training data vs. a hallucination from temporal blindness?

Missing training data: the AI doesn’t know about something because it wasn’t in the training set (reasonable limitation). Temporal blindness: the AI knows about multiple time periods but can’t distinguish between them, so it confidently picks one at random without awareness that some data is outdated (systemic problem).

Can prompt engineering solve temporal blindness?

Only partially. You can tell the LLM “use only recent data” or “flag if information might be outdated,” and the model will follow instructions better than baseline. But the model has no built-in understanding of what “recent” means or whether the data in its training set is actually recent. Without access to a context layer that tracks time, temporal awareness remains brittle.

If a context graph tracks temporal information, does the LLM automatically use it correctly?

Not automatically. The model needs to be structured to query the context layer, receive metadata about information freshness and credibility, and incorporate that into its response. This is where architecture matters — the AI system needs to be designed from the start around the assumption that temporal metadata drives decision-making, not just pattern matching.

Why is temporal reasoning especially critical for enterprise AI?

Enterprise decisions often depend on current information: who leads a company now, what’s the current regulation, what’s the market price today. An LLM with temporal blindness will confidently return outdated answers. Enterprises can’t afford hallucinations driven by stale data, so they need infrastructure that separates the time-blind language model from a time-aware context layer.

How does human credibility evaluation use temporal reasoning?

Humans judge source credibility partly based on whether sources have been right or wrong over time. A financial analyst who’s made accurate predictions over five years becomes more credible than a source making the same prediction for the first time. A news organization with consistent accuracy builds trust over time. AI systems need the same capability: measuring source reliability through temporal consistency.

Is temporal reasoning the same as continuous learning?

Related, but different. Continuous learning is when an AI system updates its model based on new evidence. Temporal reasoning is when a system understands that facts change over time and tracks when they change. You can have temporal reasoning (tracking when information changed) without continuous learning (updating your base model). TrustGraph focuses on temporal reasoning as the foundational layer.

Can I add temporal reasoning on top of an existing LLM?

Yes — by building a context layer that tracks temporal metadata and querying it before asking the LLM for a response. The LLM itself doesn’t change. But the system architecture changes: instead of asking the model to answer directly, you give it context-with-temporal-metadata, and the model generates a response based on that enriched input. This is why infrastructure matters.

Full episode coming soon

This conversation with Daniel Davis is on its way. Check out other episodes in the meantime.

Visit the Channel

Related Insights