How ClickUp Uses Merkle Trees to Build AI Agent Memory
Jay Hack, Head of AI at ClickUp
Most AI memory implementations work like a notebook: the agent jots down facts and retrieves them later. That approach breaks at organizational scale, where thousands of conversations, documents, and tasks generate context daily — and not everyone should see everything.
Jay Hack, Head of AI at ClickUp — the $4B work management platform where entire organizations run their workflow through AI agents — is building something different. Drawing from an unexpected source, his team adapted the data structure that underpins Bitcoin to solve the problem of organizational AI memory.
The Problem With Flat Memory
When Claude or ChatGPT remembers something about you, it’s a single-player operation. One user, one memory store, no permissions. Jay is blunt about the limitation: “Claude, OpenAI, Gemini, haven’t had to solve this because their AI is purely multiplayer. It’s something you chat with one-on-one and then will basically remember like little tidbits from your conversation.”
ClickUp’s customers are organizations of 20 to 2,000 people. Most of the useful information that an AI agent needs to remember doesn’t happen in isolated chats — it emerges from conversations between people, across channels, in documents, and in task threads. A flat memory store can’t handle this. It doesn’t know what to summarize, when to update, or who should see what.
Hierarchical Summarization, Bitcoin Style
The architecture Jay describes borrows from Merkle trees — the hash-based data structure that lets Bitcoin verify transaction integrity without checking every individual transaction.
“We take a lot of inspiration from the thing that sits at the basis of Bitcoin, the Merkle tree, which is a way to efficiently compute whether a bunch of hashes combine into another hash,” Jay explains. “You can do the same thing for basically understanding, do I need to recompute a summary for a given portion of a workspace based upon the updates that have been made in it recently?”
Here’s how it works in practice. An organization generates continuous signals: chat messages, task updates, document edits, meeting notes. The system distills relevant insights from each action and stores them as summaries, building a secondary graph on top of the existing communication graph. When something changes in one branch, the Merkle-tree-inspired structure lets the system determine exactly which summaries need recomputation — without reprocessing everything.
“We are computing compressions, summaries, and sort of a secondary graph on top of the existing graph of human communications. And then this secondary graph is the thing that the LLM will primarily communicate with.”
Permissions as a First-Class Concern
The summarization layer inherits the same permissions as the underlying data. If a conversation happened in a private channel, the summary is only available to members of that channel. Jay frames this on the same primitive data models ClickUp already uses for access control.
“Just as an agent should be able to search through chat messages, it should be able to search through summaries of chat messages. And this shouldn’t require some type of large scale re-architecting for us.”
This sounds simple, but the edge cases multiply fast. An agent might participate in a shared channel, learn something, and then be asked about it by someone who wasn’t in that channel. Should the agent share? There’s no industry consensus. Jay calls it “fundamental science” — the kind of problem where ClickUp is pioneering approaches because the cloud AI providers haven’t needed to solve it yet.
The Infinite Context Alternative
There’s a competing theory. Dario Amodei at Anthropic has advocated for simply extending context windows to millions of tokens — dumping the agent’s entire history into context and letting the transformer figure out what matters.
Jay takes a pragmatic view. “It is definitely not efficient today,” he says, noting that current attention mechanisms are quadratic in compute cost. Linear attention alternatives like RWKV and Mamba exist but “aren’t very good.” He doesn’t rule it out long-term but won’t bet his architecture on it.
“Part of being in the application layer game is I certainly have certain theses around what’s going to happen in the future, but you have to be absolutely ferocious in terms of taking whatever is working and then applying that. And you have to be ready to burn down your entire tech stack every four months.”
For now, hierarchical summarization with a Merkle-tree-inspired update mechanism is what works. If infinite efficient context arrives, Jay says ClickUp will adopt it. Until then, the secondary graph approach delivers reliable organizational memory at scale.
FAQ
What is a Merkle tree and how is it used in AI memory?
A Merkle tree is a hash-based data structure originally used in Bitcoin to verify transaction integrity efficiently. ClickUp adapts this concept for AI memory by organizing workspace summaries hierarchically — when one branch of data changes, the system can determine exactly which summaries need recomputation without reprocessing the entire workspace.
How does ClickUp’s AI memory differ from ChatGPT or Claude memory?
ChatGPT and Claude offer single-player memory — one user’s conversation history stored for future reference. ClickUp’s memory is multiplayer, operating across entire organizations of 20 to 2,000 people. It captures and summarizes cross-team conversations, documents, and tasks while enforcing permissions so information isn’t leaked across access boundaries.
What is hierarchical summarization in AI systems?
Hierarchical summarization compresses raw organizational data — chat messages, task updates, document edits — into layered summaries that agents can query. Rather than searching through raw conversation logs, the AI accesses pre-computed compressions that preserve the essential insights. ClickUp builds this as a “secondary graph on top of the existing graph of human communications.”
Can AI agents handle data permissions in multiplayer environments?
It’s one of the hardest unsolved problems. Jay Hack describes scenarios where an agent participating in a private channel must decide whether to share that context with users who lack access. ClickUp’s approach permissions memory objects like any other workspace object, but Jay acknowledges “there is no converged upon correct answer in industry today.”
Why not just use infinite context windows instead of memory architecture?
Current transformer attention is quadratic in compute cost, making million-token contexts expensive. Linear attention alternatives like RWKV and Mamba exist but underperform frontier models. Jay Hack describes the infinite context approach as “definitely not efficient today” and builds external memory instead, while remaining ready to adopt efficient long-context models if the research delivers them.
How does ClickUp’s memory architecture handle organizational scale?
The system pre-computes summaries as actions happen — chats, task updates, document edits — and stores them in a knowledge graph with the same permission model as the source data. The Merkle-tree-inspired structure enables efficient incremental updates. Only the branches where new activity occurs trigger recomputation, avoiding the cost of reprocessing an entire workspace.
What is the difference between memory, knowledge, and context in AI systems?
Jay Hack draws clear distinctions. Knowledge is information baked into model weights during pre-training. Memory is information an agent picks up from interactions after deployment — stored externally, not in model weights. Context is the raw tokens fed into an LLM for a given request — “a more brutish way of thinking about the input.” Context can contain the same knowledge repeated 50 times but still represent only one piece of information.
How does pre-computed context improve AI agent performance?
Pre-computed summaries give agents faster, more reliable access to relevant information compared to runtime retrieval. Instead of querying multiple APIs and synthesizing on the fly, the agent accesses a curated knowledge graph where insights are already distilled. Jay describes this as the difference between “letting the agent stitch together information at runtime” and “having the important insights precomputed.”
What workspace data feeds into ClickUp’s AI memory system?
Tasks, chats, documents, whiteboards, meeting notes, email, and external integrations all flow into ClickUp’s context graph. The system treats each data type as a node in the same knowledge graph, enabling cross-source queries. An agent can reference a decision from a chat thread when updating a task or drafting a document.
Full episode coming soon
This conversation with Jay Hack is on its way. Check out other episodes in the meantime.
Visit the ChannelMore from Jay Hack
Founder Archetype
Read Jay Hack's archetype profile
The Magician · Classical: Hermes · The Return